linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] fs/dcache: Limit # of negative dentries
@ 2017-07-21 13:43 Waiman Long
  2017-07-21 13:43 ` [PATCH v2 1/4] fs/dcache: Limit numbers " Waiman Long
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Waiman Long @ 2017-07-21 13:43 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Waiman Long

 v1->v2:
  - Move the new nr_negative field to the end of dentry_stat_t structure
    as suggested by Matthew Wilcox.
  - With the help of Miklos Szeredi, fix incorrect locking order in
    dentry_kill() by using lock_parent() instead of locking the parent's
    d_lock directly.
  - Correctly account for positive to negative dentry transitions.
  - Automatic pruning of negative dentries will now ignore the reference
    bit in negative dentries but not the regular shrinking.

A rogue application can potentially create a large number of negative
dentries in the system consuming most of the memory available. This
can impact performance of other applications running on the system.

This patchset introduces changes to the dcache subsystem to limit
the number of negative dentries allowed to be created thus limiting
the amount of memory that can be consumed by negative dentries.

Patch 1 tracks the number of negative dentries used and disallow
the creation of more when the limit is reached.

Patch 2 enables /proc/sys/fs/dentry-state to report the number of
negative dentries in the system.

Patch 3 enables automatic pruning of negative dentries when it is
close to the limit so that we won't end up killing recently used
negative dentries.

Patch 4 prevents racing between negative dentry pruning and umount
operation.

Waiman Long (4):
  fs/dcache: Limit numbers of negative dentries
  fs/dcache: Report negative dentry number in dentry-state
  fs/dcache: Enable automatic pruning of negative dentries
  fs/dcache: Protect negative dentry pruning from racing with umount

 Documentation/admin-guide/kernel-parameters.txt |   7 +
 fs/dcache.c                                     | 408 ++++++++++++++++++++++--
 include/linux/dcache.h                          |   8 +-
 include/linux/list_lru.h                        |   1 +
 mm/list_lru.c                                   |   4 +-
 5 files changed, 392 insertions(+), 36 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-21 13:43 [PATCH v2 0/4] fs/dcache: Limit # of negative dentries Waiman Long
@ 2017-07-21 13:43 ` Waiman Long
  2017-07-21 13:43 ` [PATCH v2 2/4] fs/dcache: Report negative dentry number in dentry-state Waiman Long
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2017-07-21 13:43 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Waiman Long

The number of positive dentries is limited by the number of files
in the filesystems. The number of negative dentries, however,
has no limit other than the total amount of memory available in
the system. So a rogue application that generates a lot of negative
dentries can potentially exhaust most of the memory available in the
system impacting performance on other running applications.

To prevent this from happening, the dcache code is now updated to limit
the amount of the negative dentries in the LRU lists that can be kept
as a percentage of total available system memory. The default is 5%
and can be changed by specifying the "neg_dentry_pc=" kernel command
line option.

If the negative dentry limit is exceeded, newly created negative
dentries will be killed right after use to avoid adding unpredictable
latency to the directory lookup operation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   7 +
 fs/dcache.c                                     | 245 ++++++++++++++++++++----
 include/linux/dcache.h                          |   1 +
 3 files changed, 221 insertions(+), 32 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 372cc66..7f5497b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2383,6 +2383,13 @@
 
 	n2=		[NET] SDL Inc. RISCom/N2 synchronous serial card
 
+	neg_dentry_pc=	[KNL]
+			Range: 1-50
+			Default: 5
+			This parameter specifies the amount of negative
+			dentries allowed in the system as a percentage of
+			total system memory.
+
 	netdev=		[NET] Network devices parameters
 			Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
 			Note that mem_start is often overloaded to mean
diff --git a/fs/dcache.c b/fs/dcache.c
index f901413..866df38 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -130,8 +130,19 @@ struct dentry_stat_t dentry_stat = {
 	.age_limit = 45,
 };
 
+/*
+ * Macros and variables to manage and count negative dentries.
+ */
+#define NEG_DENTRY_BATCH	(1 << 8)
+static long neg_dentry_percpu_limit __read_mostly;
+static struct {
+	raw_spinlock_t nfree_lock;
+	long nfree;			/* Negative dentry free pool */
+} ndblk ____cacheline_aligned_in_smp;
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
+static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -227,6 +238,86 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
+/*
+ * There is a system-wide limit to the amount of negative dentries allowed
+ * in the super blocks' LRU lists. The default limit is 5% of the total
+ * system memory. This limit can be changed by using the kernel command line
+ * option "neg_dentry_pc=" to specify the percentage of the total memory
+ * that can be used for negative dentries. That percentage must be in the
+ * 1-50% range.
+ *
+ * To avoid performance problem with a global counter on an SMP system,
+ * the tracking is done mostly on a per-cpu basis. The total limit is
+ * distributed in a 80/20 ratio to per-cpu counters and a global free pool.
+ *
+ * If a per-cpu counter runs out of negative dentries, it can borrow extra
+ * ones from the global free pool. If it has more than its percpu limit,
+ * the extra ones will be returned back to the global pool.
+ */
+
+/*
+ * Decrement negative dentry count if applicable.
+ */
+static void __neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(this_cpu_dec_return(nr_dentry_neg) < 0)) {
+		long *pcnt = get_cpu_ptr(&nr_dentry_neg);
+
+		if ((*pcnt < 0) && raw_spin_trylock(&ndblk.nfree_lock)) {
+			ACCESS_ONCE(ndblk.nfree) += NEG_DENTRY_BATCH;
+			*pcnt += NEG_DENTRY_BATCH;
+			raw_spin_unlock(&ndblk.nfree_lock);
+		}
+		put_cpu_ptr(&nr_dentry_neg);
+	}
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_dec(dentry);
+}
+
+/*
+ * Increment negative dentry count if applicable.
+ */
+static void __neg_dentry_inc(struct dentry *dentry)
+{
+	long cnt, *pcnt;
+
+	if (this_cpu_inc_return(nr_dentry_neg) <= neg_dentry_percpu_limit)
+		return;
+
+	pcnt = get_cpu_ptr(&nr_dentry_neg);
+	cnt  = (READ_ONCE(ndblk.nfree) &&
+	       (*pcnt > neg_dentry_percpu_limit)) ? NEG_DENTRY_BATCH : 0;
+
+	if (cnt && raw_spin_trylock(&ndblk.nfree_lock)) {
+		long val = READ_ONCE(ndblk.nfree);
+
+		if (val < cnt)
+			cnt = val;
+		ACCESS_ONCE(ndblk.nfree) -= cnt;
+		*pcnt -= cnt;
+		raw_spin_unlock(&ndblk.nfree_lock);
+	} else {
+		cnt = 0;
+	}
+	put_cpu_ptr(&nr_dentry_neg);
+	/*
+	 * If there are too many negative dentries, set DCACHE_KILL_NEGATIVE
+	 * flag to indicate that the dentry should be killed.
+	 */
+	if (!cnt)
+		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_inc(dentry);
+}
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -329,6 +420,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
 	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
 	WRITE_ONCE(dentry->d_flags, flags);
 	dentry->d_inode = NULL;
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_inc(dentry);
 }
 
 static void dentry_free(struct dentry *dentry)
@@ -396,6 +489,7 @@ static void d_lru_add(struct dentry *dentry)
 	dentry->d_flags |= DCACHE_LRU_LIST;
 	this_cpu_inc(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -404,6 +498,7 @@ static void d_lru_del(struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_dec(dentry);
 }
 
 static void d_shrink_del(struct dentry *dentry)
@@ -434,6 +529,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	list_lru_isolate(lru, &dentry->d_lru);
+	neg_dentry_dec(dentry);
 }
 
 static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
@@ -442,6 +538,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
 	D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
 	dentry->d_flags |= DCACHE_SHRINK_LIST;
 	list_lru_isolate_move(lru, &dentry->d_lru, list);
+	neg_dentry_dec(dentry);
 }
 
 /*
@@ -586,38 +683,6 @@ static void __dentry_kill(struct dentry *dentry)
 		dentry_free(dentry);
 }
 
-/*
- * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *dentry_kill(struct dentry *dentry)
-	__releases(dentry->d_lock)
-{
-	struct inode *inode = dentry->d_inode;
-	struct dentry *parent = NULL;
-
-	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
-		goto failed;
-
-	if (!IS_ROOT(dentry)) {
-		parent = dentry->d_parent;
-		if (unlikely(!spin_trylock(&parent->d_lock))) {
-			if (inode)
-				spin_unlock(&inode->i_lock);
-			goto failed;
-		}
-	}
-
-	__dentry_kill(dentry);
-	return parent;
-
-failed:
-	spin_unlock(&dentry->d_lock);
-	return dentry; /* try again with same dentry */
-}
-
 static inline struct dentry *lock_parent(struct dentry *dentry)
 {
 	struct dentry *parent = dentry->d_parent;
@@ -653,6 +718,71 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
 }
 
 /*
+ * Finish off a dentry we've decided to kill.
+ * dentry->d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *dentry_kill(struct dentry *dentry)
+	__releases(dentry->d_lock)
+{
+	struct inode *inode = dentry->d_inode;
+	struct dentry *parent = NULL;
+
+	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
+		goto failed;
+
+	if (!IS_ROOT(dentry)) {
+		parent = dentry->d_parent;
+		if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
+			/*
+			 * Force the killing of this negative dentry when
+			 * DCACHE_KILL_NEGATIVE flag is set. This is to
+			 * protect against the worst case situation where
+			 * the parent d_lock is kept busy most of the time
+			 * while the number of negative dentries has
+			 * exceeded the limit. Leaving the dentry as is
+			 * will cause the number of negative entries to
+			 * continue to increase way past the limit under
+			 * the worst case situation.
+			 *
+			 * As the reference count is kept at 1 before
+			 * calling dentry_kill(), the dentry won't
+			 * disappear under the hood even if the dentry
+			 * lock is temporarily released.
+			 */
+			unsigned int dflags;
+
+			dentry->d_flags &= ~DCACHE_KILL_NEGATIVE;
+			dflags = dentry->d_flags;
+			parent = lock_parent(dentry);
+			/*
+			 * Abort the killing if the reference count or
+			 * d_flags changed.
+			 */
+			if ((dentry->d_flags != dflags) ||
+			    (dentry->d_lockref.count != 1)) {
+				if (parent)
+					spin_unlock(&parent->d_lock);
+				goto failed;
+			}
+
+		} else if (unlikely(!spin_trylock(&parent->d_lock))) {
+			if (inode)
+				spin_unlock(&inode->i_lock);
+			goto failed;
+		}
+	}
+
+	__dentry_kill(dentry);
+	return parent;
+
+failed:
+	spin_unlock(&dentry->d_lock);
+	return dentry; /* try again with same dentry */
+}
+
+/*
  * Try to do a lockless dput(), and return whether that was successful.
  *
  * If unsuccessful, we return false, having already taken the dentry lock.
@@ -815,6 +945,9 @@ void dput(struct dentry *dentry)
 
 	dentry_lru_add(dentry);
 
+	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
+		goto kill_it;
+
 	dentry->d_lockref.count--;
 	spin_unlock(&dentry->d_lock);
 	return;
@@ -1820,6 +1953,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	WARN_ON(d_in_lookup(dentry));
 
 	spin_lock(&dentry->d_lock);
+	/*
+	 * Decrement negative dentry count if it was in the LRU list.
+	 */
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_dec(dentry);
 	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
 	raw_write_seqcount_begin(&dentry->d_seq);
 	__d_set_inode_and_type(dentry, inode, add_flags);
@@ -3566,6 +3704,47 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+static long neg_dentry_pc __initdata = 5;
+static bool neg_dentry_warn __initdata;
+static int __init set_neg_dentry_pc(char *str)
+{
+	ssize_t ret;
+	long new_pc = neg_dentry_pc;
+
+	if (!str)
+		return 0;
+	ret = kstrtol(str, 0, &new_pc);
+	if (ret || (new_pc < 1) || (new_pc > 50))
+		ret = 1;
+	else
+		neg_dentry_pc = new_pc;
+	if (ret)
+		neg_dentry_warn = true;
+	return ret ? 0 : 1;
+}
+__setup("neg_dentry_pc=", set_neg_dentry_pc);
+
+static void __init neg_dentry_init(void)
+{
+	/* Rough estimate of # of dentries allocated per page */
+	unsigned int nr_dentry_page = PAGE_SIZE/sizeof(struct dentry) - 1;
+	unsigned long cnt;
+
+	raw_spin_lock_init(&ndblk.nfree_lock);
+
+	/* 20% in global pool & 80% in percpu free */
+	ndblk.nfree = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
+	cnt = ndblk.nfree * 4 / num_possible_cpus();
+	if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
+		cnt = 2 * NEG_DENTRY_BATCH;
+	neg_dentry_percpu_limit = cnt;
+
+	if (neg_dentry_warn)
+		pr_warn("Warning: neg_dentry_pc must be within 1-50 range.\n");
+	pr_info("Negative dentry: percpu limit = %ld, free pool = %ld\n",
+		neg_dentry_percpu_limit, ndblk.nfree);
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3606,6 +3785,8 @@ static void __init dcache_init(void)
 	dentry_cache = KMEM_CACHE(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
 
+	neg_dentry_init();
+
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3f3ff4c..498233b 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -218,6 +218,7 @@ struct dentry_operations {
 
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
+#define DCACHE_KILL_NEGATIVE		0x40000000 /* Kill negative dentry */
 
 extern seqlock_t rename_lock;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/4] fs/dcache: Report negative dentry number in dentry-state
  2017-07-21 13:43 [PATCH v2 0/4] fs/dcache: Limit # of negative dentries Waiman Long
  2017-07-21 13:43 ` [PATCH v2 1/4] fs/dcache: Limit numbers " Waiman Long
@ 2017-07-21 13:43 ` Waiman Long
  2017-07-21 13:43 ` [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
  2017-07-21 13:43 ` [PATCH v2 4/4] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long
  3 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2017-07-21 13:43 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Waiman Long

The number of negative dentries currently in the system is now reported
in the /proc/sys/fs/dentry-state file.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c            | 16 +++++++++++++++-
 include/linux/dcache.h |  7 ++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 866df38..d6d2d2d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -135,6 +135,7 @@ struct dentry_stat_t dentry_stat = {
  */
 #define NEG_DENTRY_BATCH	(1 << 8)
 static long neg_dentry_percpu_limit __read_mostly;
+static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
 	long nfree;			/* Negative dentry free pool */
@@ -176,11 +177,23 @@ static long get_nr_dentry_unused(void)
 	return sum < 0 ? 0 : sum;
 }
 
+static long get_nr_dentry_neg(void)
+{
+	int i;
+	long sum = 0;
+
+	for_each_possible_cpu(i)
+		sum += per_cpu(nr_dentry_neg, i);
+	sum += neg_dentry_nfree_init - ndblk.nfree;
+	return sum < 0 ? 0 : sum;
+}
+
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 		   size_t *lenp, loff_t *ppos)
 {
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
+	dentry_stat.nr_negative = get_nr_dentry_neg();
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -3733,7 +3746,8 @@ static void __init neg_dentry_init(void)
 	raw_spin_lock_init(&ndblk.nfree_lock);
 
 	/* 20% in global pool & 80% in percpu free */
-	ndblk.nfree = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
+	ndblk.nfree = neg_dentry_nfree_init
+		    = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
 	cnt = ndblk.nfree * 4 / num_possible_cpus();
 	if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
 		cnt = 2 * NEG_DENTRY_BATCH;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 498233b..0f749e2 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -63,9 +63,10 @@ struct qstr {
 struct dentry_stat_t {
 	long nr_dentry;
 	long nr_unused;
-	long age_limit;          /* age in seconds */
-	long want_pages;         /* pages requested by system */
-	long dummy[2];
+	long age_limit;		/* age in seconds */
+	long want_pages;	/* pages requested by system */
+	long nr_negative;	/* # of negative dentries */
+	long dummy;
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries
  2017-07-21 13:43 [PATCH v2 0/4] fs/dcache: Limit # of negative dentries Waiman Long
  2017-07-21 13:43 ` [PATCH v2 1/4] fs/dcache: Limit numbers " Waiman Long
  2017-07-21 13:43 ` [PATCH v2 2/4] fs/dcache: Report negative dentry number in dentry-state Waiman Long
@ 2017-07-21 13:43 ` Waiman Long
  2017-07-21 19:30   ` James Bottomley
  2017-07-21 13:43 ` [PATCH v2 4/4] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long
  3 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2017-07-21 13:43 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Waiman Long

Having a limit for the number of negative dentries does have an
undesirable side effect that no new negative dentries will be allowed
when the limit is reached. This will have performance implication
for some types of workloads.

So we need a way to prune the negative dentries so that new ones can
be created. This is done by using the workqueue API to do the pruning
gradually when a threshold is reached to minimize performance impact
on other running tasks. The pruning is done at a frequency of 10 runs
per second. Each run scans at most 256 LRU dentries for each node LRU
list of a certain superblock. Some non-negative dentries that happen
to be at the front of the LRU lists will also be pruned.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c              | 113 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/list_lru.h |   1 +
 mm/list_lru.c            |   4 +-
 3 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index d6d2d2d..c2ea876 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -134,13 +134,19 @@ struct dentry_stat_t dentry_stat = {
  * Macros and variables to manage and count negative dentries.
  */
 #define NEG_DENTRY_BATCH	(1 << 8)
+#define NEG_PRUNING_DELAY	(HZ/10)
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
 	long nfree;			/* Negative dentry free pool */
+	struct super_block *prune_sb;	/* Super_block for pruning */
+	int neg_count, prune_count;	/* Pruning counts */
 } ndblk ____cacheline_aligned_in_smp;
 
+static void prune_negative_dentry(struct work_struct *work);
+static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
 static DEFINE_PER_CPU(long, nr_dentry_neg);
@@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry *dentry)
 	 */
 	if (!cnt)
 		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+
+	/*
+	 * Initiate negative dentry pruning if free pool has less than
+	 * 1/4 of its initial value.
+	 */
+	if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
+		WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
+		schedule_delayed_work(&prune_neg_dentry_work,
+				      NEG_PRUNING_DELAY);
+	}
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -1320,6 +1336,103 @@ void shrink_dcache_sb(struct super_block *sb)
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
+/*
+ * A modified version that attempts to remove a limited number of negative
+ * dentries as well as some other non-negative dentries at the front.
+ */
+static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
+		struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+	struct list_head *freeable = arg;
+	struct dentry	*dentry = container_of(item, struct dentry, d_lru);
+	enum lru_status	status = LRU_SKIP;
+
+	/*
+	 * Stop further list walking for the current node list to limit
+	 * performance impact, but allow further walking in the next node
+	 * list.
+	 */
+	if ((ndblk.neg_count >= NEG_DENTRY_BATCH) ||
+	    (ndblk.prune_count >= NEG_DENTRY_BATCH)) {
+		ndblk.prune_count = 0;
+		return LRU_STOP;
+	}
+
+	/*
+	 * we are inverting the lru lock/dentry->d_lock here,
+	 * so use a trylock. If we fail to get the lock, just skip
+	 * it
+	 */
+	if (!spin_trylock(&dentry->d_lock)) {
+		ndblk.prune_count++;
+		return LRU_SKIP;
+	}
+
+	/*
+	 * Referenced dentries are still in use. If they have active
+	 * counts, just remove them from the LRU. Otherwise give them
+	 * another pass through the LRU.
+	 */
+	if (dentry->d_lockref.count) {
+		d_lru_isolate(lru, dentry);
+		status = LRU_REMOVED;
+		goto out;
+	}
+
+	/*
+	 * Dentries with reference bit on are moved back to the tail
+	 * except for the negative ones.
+	 */
+	if ((dentry->d_flags & DCACHE_REFERENCED) && !d_is_negative(dentry)) {
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		status = LRU_ROTATE;
+		goto out;
+	}
+
+	status = LRU_REMOVED;
+	d_lru_shrink_move(lru, dentry, freeable);
+	if (d_is_negative(dentry))
+		ndblk.neg_count++;
+out:
+	spin_unlock(&dentry->d_lock);
+	ndblk.prune_count++;
+	return status;
+}
+
+/*
+ * A workqueue function to prune negative dentry.
+ *
+ * The pruning is done gradually over time so as not to have noticeable
+ * performance impact.
+ */
+static void prune_negative_dentry(struct work_struct *work)
+{
+	int freed;
+	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	LIST_HEAD(dispose);
+
+	if (!sb)
+		return;
+
+	ndblk.neg_count = ndblk.prune_count = 0;
+	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
+			      &dispose, NEG_DENTRY_BATCH);
+
+	if (freed)
+		shrink_dentry_list(&dispose);
+	/*
+	 * Continue delayed pruning once every second until negative dentry
+	 * free pool is at least 1/2 of the initial value or the super_block
+	 * has no more negative dentries left at the front.
+	 */
+	if (ndblk.neg_count &&
+	   (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/2))
+		schedule_delayed_work(&prune_neg_dentry_work,
+				      NEG_PRUNING_DELAY);
+	else
+		WRITE_ONCE(ndblk.prune_sb, NULL);
+}
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index fa7fd03..06c9d15 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -22,6 +22,7 @@ enum lru_status {
 	LRU_SKIP,		/* item cannot be locked, skip */
 	LRU_RETRY,		/* item not freeable. May drop the lock
 				   internally, but has to return locked. */
+	LRU_STOP,		/* stop walking the list */
 };
 
 struct list_lru_one {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 7a40fa2..f6e7796 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -244,11 +244,13 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 			 */
 			assert_spin_locked(&nlru->lock);
 			goto restart;
+		case LRU_STOP:
+			goto out;
 		default:
 			BUG();
 		}
 	}
-
+out:
 	spin_unlock(&nlru->lock);
 	return isolated;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/4] fs/dcache: Protect negative dentry pruning from racing with umount
  2017-07-21 13:43 [PATCH v2 0/4] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (2 preceding siblings ...)
  2017-07-21 13:43 ` [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
@ 2017-07-21 13:43 ` Waiman Long
  3 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2017-07-21 13:43 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Waiman Long

The negative dentry pruning is done on a specific super_block set
in the ndblk.prune_sb variable. If the super_block is also being
un-mounted concurrently, the content of the super_block may no longer
be valid.

To protect against such racing condition, a new lock is added to
the ndblk structure to synchronize the negative dentry pruning and
umount operation. This is a regular spinlock as the pruning operation
can be quite time consuming.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index c2ea876..a3159f3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -139,11 +139,13 @@ struct dentry_stat_t dentry_stat = {
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
+	spinlock_t prune_lock;		/* Lock for protecting pruning */
 	long nfree;			/* Negative dentry free pool */
 	struct super_block *prune_sb;	/* Super_block for pruning */
 	int neg_count, prune_count;	/* Pruning counts */
 } ndblk ____cacheline_aligned_in_smp;
 
+static void clear_prune_sb_for_umount(struct super_block *sb);
 static void prune_negative_dentry(struct work_struct *work);
 static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
 
@@ -1323,6 +1325,7 @@ void shrink_dcache_sb(struct super_block *sb)
 {
 	long freed;
 
+	clear_prune_sb_for_umount(sb);
 	do {
 		LIST_HEAD(dispose);
 
@@ -1353,7 +1356,8 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
 	 * list.
 	 */
 	if ((ndblk.neg_count >= NEG_DENTRY_BATCH) ||
-	    (ndblk.prune_count >= NEG_DENTRY_BATCH)) {
+	    (ndblk.prune_count >= NEG_DENTRY_BATCH) ||
+	    !READ_ONCE(ndblk.prune_sb)) {
 		ndblk.prune_count = 0;
 		return LRU_STOP;
 	}
@@ -1408,15 +1412,24 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
 static void prune_negative_dentry(struct work_struct *work)
 {
 	int freed;
-	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	struct super_block *sb;
 	LIST_HEAD(dispose);
 
-	if (!sb)
+	/*
+	 * The prune_lock is used to protect negative dentry pruning from
+	 * racing with concurrent umount operation.
+	 */
+	spin_lock(&ndblk.prune_lock);
+	sb = READ_ONCE(ndblk.prune_sb);
+	if (!sb) {
+		spin_unlock(&ndblk.prune_lock);
 		return;
+	}
 
 	ndblk.neg_count = ndblk.prune_count = 0;
 	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
 			      &dispose, NEG_DENTRY_BATCH);
+	spin_unlock(&ndblk.prune_lock);
 
 	if (freed)
 		shrink_dentry_list(&dispose);
@@ -1433,6 +1446,27 @@ static void prune_negative_dentry(struct work_struct *work)
 		WRITE_ONCE(ndblk.prune_sb, NULL);
 }
 
+/*
+ * This is called before an umount to clear ndblk.prune_sb if it
+ * matches the given super_block.
+ */
+static void clear_prune_sb_for_umount(struct super_block *sb)
+{
+	if (likely(READ_ONCE(ndblk.prune_sb) != sb))
+		return;
+	WRITE_ONCE(ndblk.prune_sb, NULL);
+	/*
+	 * Need to wait until an ongoing pruning operation, if present,
+	 * is completed.
+	 *
+	 * Clearing ndblk.prune_sb will hasten the completion of pruning.
+	 * In the unlikely event that ndblk.prune_sb is set to another
+	 * super_block, the waiting will last the complete pruning operation
+	 * which shouldn't be that long either.
+	 */
+	spin_unlock_wait(&ndblk.prune_lock);
+}
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
@@ -1755,6 +1789,7 @@ void shrink_dcache_for_umount(struct super_block *sb)
 
 	WARN(down_read_trylock(&sb->s_umount), "s_umount should've been locked");
 
+	clear_prune_sb_for_umount(sb);
 	dentry = sb->s_root;
 	sb->s_root = NULL;
 	do_one_tree(dentry);
@@ -3857,6 +3892,7 @@ static void __init neg_dentry_init(void)
 	unsigned long cnt;
 
 	raw_spin_lock_init(&ndblk.nfree_lock);
+	spin_lock_init(&ndblk.prune_lock);
 
 	/* 20% in global pool & 80% in percpu free */
 	ndblk.nfree = neg_dentry_nfree_init
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries
  2017-07-21 13:43 ` [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
@ 2017-07-21 19:30   ` James Bottomley
  2017-07-21 20:17     ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2017-07-21 19:30 UTC (permalink / raw)
  To: Waiman Long, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman

On Fri, 2017-07-21 at 09:43 -0400, Waiman Long wrote:
> Having a limit for the number of negative dentries does have an
> undesirable side effect that no new negative dentries will be allowed
> when the limit is reached. This will have performance implication
> for some types of workloads.

This really seems like a significant problem: negative dentries should
be released in strict lru order because the chances are no-one cares
about the least recently used one, but they may care about having the
most recently created one.

[...]
> @@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry
> *dentry)
>  	 */
>  	if (!cnt)
>  		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
> +
> +	/*
> +	 * Initiate negative dentry pruning if free pool has less
> than
> +	 * 1/4 of its initial value.
> +	 */
> +	if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
> +		WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
> +		schedule_delayed_work(&prune_neg_dentry_work,
> +				      NEG_PRUNING_DELAY);
> +	}

So here, why not run the negative dentry shrinker synchronously to see
if we can shrink the cache and avoid killing the current negative
dentry.  If there are context problems doing that, we should at least
make the effort to track down the least recently used negative dentry
and mark that for killing instead.

James

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries
  2017-07-21 19:30   ` James Bottomley
@ 2017-07-21 20:17     ` Waiman Long
  2017-07-21 23:07       ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2017-07-21 20:17 UTC (permalink / raw)
  To: James Bottomley, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman

On 07/21/2017 03:30 PM, James Bottomley wrote:
> On Fri, 2017-07-21 at 09:43 -0400, Waiman Long wrote:
>> Having a limit for the number of negative dentries does have an
>> undesirable side effect that no new negative dentries will be allowed
>> when the limit is reached. This will have performance implication
>> for some types of workloads.
> This really seems like a significant problem: negative dentries should
> be released in strict lru order because the chances are no-one cares
> about the least recently used one, but they may care about having the
> most recently created one.

This should not happen under normal circumstances as the asynchronous
shrinker should be able to keep enough free negative dentry available in
the pool that direct negative dentry killing will rarely happen.

> [...]
>> @@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry
>> *dentry)
>>  	 */
>>  	if (!cnt)
>>  		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
>> +
>> +	/*
>> +	 * Initiate negative dentry pruning if free pool has less
>> than
>> +	 * 1/4 of its initial value.
>> +	 */
>> +	if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
>> +		WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
>> +		schedule_delayed_work(&prune_neg_dentry_work,
>> +				      NEG_PRUNING_DELAY);
>> +	}
> So here, why not run the negative dentry shrinker synchronously to see
> if we can shrink the cache and avoid killing the current negative
> dentry.  If there are context problems doing that, we should at least
> make the effort to track down the least recently used negative dentry
> and mark that for killing instead.

Only one CPU will be calling the asynchronous shrinker. So its effect on
the overall performance of the system should be negligible.

Allowing all CPUs to potentially do synchronous shrinking can cause a
lot of lock and cacheline contention. I will look further to see if
there is opportunity to do some optimistic synchronous shrinking. If
that fails because of a contended lock, for example, we will need to
fall back to killing the dentry. That should only happen under the worst
case situation, like when a malicious process is running.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries
  2017-07-21 20:17     ` Waiman Long
@ 2017-07-21 23:07       ` James Bottomley
  2017-07-24 15:54         ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2017-07-21 23:07 UTC (permalink / raw)
  To: Waiman Long, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman

On Fri, 2017-07-21 at 16:17 -0400, Waiman Long wrote:
> On 07/21/2017 03:30 PM, James Bottomley wrote:
> > 
> > On Fri, 2017-07-21 at 09:43 -0400, Waiman Long wrote:
> > > 
> > > Having a limit for the number of negative dentries does have an
> > > undesirable side effect that no new negative dentries will be
> > > allowed when the limit is reached. This will have performance
> > > implication for some types of workloads.
> > This really seems like a significant problem: negative dentries
> > should be released in strict lru order because the chances are no-
> > one cares about the least recently used one, but they may care
> > about having the most recently created one.
> 
> This should not happen under normal circumstances as the asynchronous
> shrinker should be able to keep enough free negative dentry available
> in the pool that direct negative dentry killing will rarely happen.

But that's an argument for not bothering with the patch set at all: if
it's a corner case that rarely occurs.

Perhaps we should start with why the series is important and then get
back to what we do if the condition is hit?

> > 
> > [...]
> > > 
> > > @@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry
> > > *dentry)
> > >  	 */
> > >  	if (!cnt)
> > >  		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
> > > +
> > > +	/*
> > > +	 * Initiate negative dentry pruning if free pool has
> > > less
> > > than
> > > +	 * 1/4 of its initial value.
> > > +	 */
> > > +	if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
> > > +		WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
> > > +		schedule_delayed_work(&prune_neg_dentry_work,
> > > +				      NEG_PRUNING_DELAY);
> > > +	}
> > So here, why not run the negative dentry shrinker synchronously to
> > see if we can shrink the cache and avoid killing the current
> > negative dentry.  If there are context problems doing that, we
> > should at least make the effort to track down the least recently
> > used negative dentry and mark that for killing instead.
> 
> Only one CPU will be calling the asynchronous shrinker. So its effect
> on the overall performance of the system should be negligible.
> 
> Allowing all CPUs to potentially do synchronous shrinking can cause a
> lot of lock and cacheline contention.

Does that matter on something you thought was a rare occurrence above?
 Plus, if the only use case is malicious user, as you say below, then
they get to pay the biggest overhead penalty because they're the ones
trying to create negative dentries.

Perhaps if the threat model truly is malicious users creating negative
dentries then a better fix is to add a delay penalty to true negative
dentry creation (say short msleep) when we get into this situation (if
you only pay the penalty on true negative dentry creation, not negative
promotes to positive, then we'll mostly penalise the process trying to
be malicious).

>  I will look further to see if there is opportunity to do some
> optimistic synchronous shrinking. If that fails because of a
> contended lock, for example, we will need to fall back to killing the
> dentry. That should only happen under the worst case situation, like
> when a malicious process is running.

Right, so I can force this by doing a whole load of non existent file
lookups then what happens:  negative dentries stop working for everyone
because they're killed as soon as they're created.  Negative dentries
are useful performance enhancements for things like the usual does
x.lock exist, if not modify x things that applications do.

It also looks like the dentry never loses DCACHE_KILL_NEGATIVE, so if
I'm creating the x.lock file, and we're in this situation, it gets a
positive dentry with the DCACHE_KILL_NEGATIVE flag set (because we
start the lookup finding a negative dentry which gets the flag and then
promote it to positive).

James

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries
  2017-07-21 23:07       ` James Bottomley
@ 2017-07-24 15:54         ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2017-07-24 15:54 UTC (permalink / raw)
  To: James Bottomley, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman

On 07/21/2017 07:07 PM, James Bottomley wrote:
> On Fri, 2017-07-21 at 16:17 -0400, Waiman Long wrote:
>> On 07/21/2017 03:30 PM, James Bottomley wrote:
>>> On Fri, 2017-07-21 at 09:43 -0400, Waiman Long wrote:
>>>> Having a limit for the number of negative dentries does have an
>>>> undesirable side effect that no new negative dentries will be
>>>> allowed when the limit is reached. This will have performance
>>>> implication for some types of workloads.
>>> This really seems like a significant problem: negative dentries
>>> should be released in strict lru order because the chances are no-
>>> one cares about the least recently used one, but they may care
>>> about having the most recently created one.
>> This should not happen under normal circumstances as the asynchronous
>> shrinker should be able to keep enough free negative dentry available
>> in the pool that direct negative dentry killing will rarely happen.
> But that's an argument for not bothering with the patch set at all: if
> it's a corner case that rarely occurs.
>
> Perhaps we should start with why the series is important and then get
> back to what we do if the condition is hit?

There is concern that unlimited negative dentry growth can be used as a
kind of DoS attack by robbing system of all its memory and triggering
massive memory reclaim or even OOM kill. This patchset is aimed to
prevent this kind of worst case situation from happening.

>>> [...]
>>>> @@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry
>>>> *dentry)
>>>>  	 */
>>>>  	if (!cnt)
>>>>  		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
>>>> +
>>>> +	/*
>>>> +	 * Initiate negative dentry pruning if free pool has
>>>> less
>>>> than
>>>> +	 * 1/4 of its initial value.
>>>> +	 */
>>>> +	if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
>>>> +		WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
>>>> +		schedule_delayed_work(&prune_neg_dentry_work,
>>>> +				      NEG_PRUNING_DELAY);
>>>> +	}
>>> So here, why not run the negative dentry shrinker synchronously to
>>> see if we can shrink the cache and avoid killing the current
>>> negative dentry.  If there are context problems doing that, we
>>> should at least make the effort to track down the least recently
>>> used negative dentry and mark that for killing instead.
>> Only one CPU will be calling the asynchronous shrinker. So its effect
>> on the overall performance of the system should be negligible.
>>
>> Allowing all CPUs to potentially do synchronous shrinking can cause a
>> lot of lock and cacheline contention.
> Does that matter on something you thought was a rare occurrence above?
>  Plus, if the only use case is malicious user, as you say below, then
> they get to pay the biggest overhead penalty because they're the ones
> trying to create negative dentries.
>
> Perhaps if the threat model truly is malicious users creating negative
> dentries then a better fix is to add a delay penalty to true negative
> dentry creation (say short msleep) when we get into this situation (if
> you only pay the penalty on true negative dentry creation, not negative
> promotes to positive, then we'll mostly penalise the process trying to
> be malicious).

I can insert a delay loop before killing the negative dentry. Taking a
msleep can be problematic as the callers of dput() may not be in a
sleepable state.

>>  I will look further to see if there is opportunity to do some
>> optimistic synchronous shrinking. If that fails because of a
>> contended lock, for example, we will need to fall back to killing the
>> dentry. That should only happen under the worst case situation, like
>> when a malicious process is running.
> Right, so I can force this by doing a whole load of non existent file
> lookups then what happens:  negative dentries stop working for everyone
> because they're killed as soon as they're created.  Negative dentries
> are useful performance enhancements for things like the usual does
> x.lock exist, if not modify x things that applications do.

I am well aware of the performance benefit offered by negative dentries.
That is why I added this patch to prune the LRU list before it is too
late and is forced to kill negative dentries. However, negative dentry
killing may still happen if the negative dentry generation rate is
faster than the pruning rate.
This can be caused by bugs in the applications or malicious intent. If
this happens, it is likely that negative dentries will consume most of
the system memory without this patchset. Application performance will
suffer somewhat, but it will not as bad as when most of the memory are
consumed by negative dentries.

> It also looks like the dentry never loses DCACHE_KILL_NEGATIVE, so if
> I'm creating the x.lock file, and we're in this situation, it gets a
> positive dentry with the DCACHE_KILL_NEGATIVE flag set (because we
> start the lookup finding a negative dentry which gets the flag and then
> promote it to positive).

Thanks for pointing this out. I am going to fix this bug in my patch.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-07-24 15:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-21 13:43 [PATCH v2 0/4] fs/dcache: Limit # of negative dentries Waiman Long
2017-07-21 13:43 ` [PATCH v2 1/4] fs/dcache: Limit numbers " Waiman Long
2017-07-21 13:43 ` [PATCH v2 2/4] fs/dcache: Report negative dentry number in dentry-state Waiman Long
2017-07-21 13:43 ` [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
2017-07-21 19:30   ` James Bottomley
2017-07-21 20:17     ` Waiman Long
2017-07-21 23:07       ` James Bottomley
2017-07-24 15:54         ` Waiman Long
2017-07-21 13:43 ` [PATCH v2 4/4] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).