public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: "Håkon Bugge" <haakon.bugge@oracle.com>
To: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, rds-devel@oss.oracle.com
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	"Tariq Toukan" <tariqt@nvidia.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>, "Tejun Heo" <tj@kernel.org>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	"Allison Henderson" <allison.henderson@oracle.com>,
	"Manjunath Patil" <manjunath.b.patil@oracle.com>,
	"Mark Zhang" <markzhang@nvidia.com>,
	"Håkon Bugge" <haakon.bugge@oracle.com>,
	"Chuck Lever" <chuck.lever@oracle.com>,
	"Shiraz Saleem" <shiraz.saleem@intel.com>,
	"Yang Li" <yang.lee@linux.alibaba.com>
Subject: [PATCH 1/6] workqueue: Inherit NOIO and NOFS alloc flags
Date: Mon, 13 May 2024 14:53:41 +0200	[thread overview]
Message-ID: <20240513125346.764076-2-haakon.bugge@oracle.com> (raw)
In-Reply-To: <20240513125346.764076-1-haakon.bugge@oracle.com>

For drivers/modules running inside a
memalloc_{noio,nofs}_{save,restore} region, if a work-queue is
created, we make sure work executed on the work-queue inherits the
same flag(s).

This in order to conditionally enable drivers to work aligned with
block I/O devices. This commit makes sure that any work queued later
on work-queues created during module initialization, when current's
flags has PF_MEMALLOC_{NOIO,NOFS} set, will inherit the same flags.

We do this in order to enable drivers to be used as a network block
I/O device. This in order to support XFS or other file-systems on top
of a raw block device which uses said drivers as the network transport
layer.

Under intense memory pressure, we get memory reclaims. Assume the
file-system reclaims memory, goes to the raw block device, which calls
into said drivers. Now, if regular GFP_KERNEL allocations in the
drivers require reclaims to be fulfilled, we end up in a circular
dependency.

We break this circular dependency by:

1. Force all allocations in the drivers to use GFP_NOIO, by means of a
   parenthetic use of memalloc_noio_{save,restore} on all relevant
   entry points.

2. Make sure work-queues inherits current->flags
   wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the
   work-queue inherits the same flag(s). That is what this commit
   contributes with.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 include/linux/workqueue.h |  2 ++
 kernel/workqueue.c        | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 158784dd189ab..09ecc692ffcae 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -398,6 +398,8 @@ enum wq_flags {
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
+	__WQ_NOIO               = 1 << 19, /* internal: execute work with NOIO */
+	__WQ_NOFS               = 1 << 20, /* internal: execute work with NOFS */
 
 	/* BH wq only allows the following flags */
 	__WQ_BH_ALLOWS		= WQ_BH | WQ_HIGHPRI,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d2dbe099286b9..a1d166a7c0f85 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -51,6 +51,7 @@
 #include <linux/uaccess.h>
 #include <linux/sched/isolation.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/mm.h>
 #include <linux/nmi.h>
 #include <linux/kvm_para.h>
 #include <linux/delay.h>
@@ -3172,6 +3173,10 @@ __acquires(&pool->lock)
 	unsigned long work_data;
 	int lockdep_start_depth, rcu_start_depth;
 	bool bh_draining = pool->flags & POOL_BH_DRAINING;
+	bool use_noio_allocs = pwq->wq->flags & __WQ_NOIO;
+	bool use_nofs_allocs = pwq->wq->flags & __WQ_NOFS;
+	unsigned long noio_flags;
+	unsigned long nofs_flags;
 #ifdef CONFIG_LOCKDEP
 	/*
 	 * It is permissible to free the struct work_struct from
@@ -3184,6 +3189,12 @@ __acquires(&pool->lock)
 
 	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
 #endif
+	/* Set inherited alloc flags */
+	if (use_noio_allocs)
+		noio_flags = memalloc_noio_save();
+	if (use_nofs_allocs)
+		nofs_flags = memalloc_nofs_save();
+
 	/* ensure we're on the correct CPU */
 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
 		     raw_smp_processor_id() != pool->cpu);
@@ -3320,6 +3331,12 @@ __acquires(&pool->lock)
 
 	/* must be the last step, see the function comment */
 	pwq_dec_nr_in_flight(pwq, work_data);
+
+	/* Restore alloc flags */
+	if (use_nofs_allocs)
+		memalloc_nofs_restore(nofs_flags);
+	if (use_noio_allocs)
+		memalloc_noio_restore(noio_flags);
 }
 
 /**
-- 
2.39.3


  reply	other threads:[~2024-05-13 12:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-13 12:53 [PATCH 0/6] rds: rdma: Add ability to force GFP_NOIO Håkon Bugge
2024-05-13 12:53 ` Håkon Bugge [this message]
2024-05-13 16:48   ` [PATCH 1/6] workqueue: Inherit NOIO and NOFS alloc flags Tejun Heo
2024-05-14 13:48     ` Haakon Bugge
2024-05-14 16:49       ` Tejun Heo
2024-05-15 14:11         ` Haakon Bugge
2024-05-13 12:53 ` [PATCH 2/6] rds: Brute force GFP_NOIO Håkon Bugge
2024-05-13 18:04   ` kernel test robot
2024-05-13 18:14   ` Simon Horman
2024-05-14 13:31     ` Haakon Bugge
2024-05-13 12:53 ` [PATCH 3/6] RDMA/cma: " Håkon Bugge
2024-05-13 12:53 ` [PATCH 4/6] RDMA/cm: " Håkon Bugge
2024-05-13 12:53 ` [PATCH 5/6] RDMA/mlx5: " Håkon Bugge
2024-05-13 12:53 ` [PATCH 6/6] net/mlx5: " Håkon Bugge
2024-05-13 23:03 ` [PATCH 0/6] rds: rdma: Add ability to " Jason Gunthorpe
2024-05-14 18:19   ` Haakon Bugge
2024-05-17 17:30     ` Jason Gunthorpe
2024-05-14  8:53 ` Zhu Yanjun
2024-05-14 12:02   ` Zhu Yanjun
2024-05-14 18:32     ` Haakon Bugge
2024-05-15 10:25       ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240513125346.764076-2-haakon.bugge@oracle.com \
    --to=haakon.bugge@oracle.com \
    --cc=allison.henderson@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jiangshanlai@gmail.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=manjunath.b.patil@oracle.com \
    --cc=markzhang@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rds-devel@oss.oracle.com \
    --cc=saeedm@nvidia.com \
    --cc=shiraz.saleem@intel.com \
    --cc=tariqt@nvidia.com \
    --cc=tj@kernel.org \
    --cc=yang.lee@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox