* [Cluster-devel] [PATCH AUTOSEL 5.0 001/317] gfs2: Fix lru_count going negative
@ 2019-05-22 19:18 Sasha Levin
2019-05-22 19:18 ` [Cluster-devel] [PATCH AUTOSEL 5.0 005/317] gfs2: fix race between gfs2_freeze_func and unmount Sasha Levin
2019-05-22 19:18 ` [Cluster-devel] [PATCH AUTOSEL 5.0 007/317] gfs2: Fix occasional glock use-after-free Sasha Levin
0 siblings, 2 replies; 3+ messages in thread
From: Sasha Levin @ 2019-05-22 19:18 UTC (permalink / raw)
To: cluster-devel.redhat.com
From: Ross Lagerwall <ross.lagerwall@citrix.com>
[ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:
vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
negative objects to delete nr=-1
This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
to the lru list but the lru_count has still decreased by one.
Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/gfs2/glock.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4b038f25f2564..2d25d89e77f9b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -183,15 +183,19 @@ static int demote_ok(const struct gfs2_glock *gl)
void gfs2_glock_add_to_lru(struct gfs2_glock *gl)
{
+ if (!(gl->gl_ops->go_flags & GLOF_LRU))
+ return;
+
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru))
- list_del_init(&gl->gl_lru);
- else
+ list_del(&gl->gl_lru);
+ list_add_tail(&gl->gl_lru, &lru_list);
+
+ if (!test_bit(GLF_LRU, &gl->gl_flags)) {
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
+ }
- list_add_tail(&gl->gl_lru, &lru_list);
- set_bit(GLF_LRU, &gl->gl_flags);
spin_unlock(&lru_lock);
}
@@ -201,7 +205,7 @@ static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl)
return;
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru)) {
+ if (test_bit(GLF_LRU, &gl->gl_flags)) {
list_del_init(&gl->gl_lru);
atomic_dec(&lru_count);
clear_bit(GLF_LRU, &gl->gl_flags);
@@ -1159,8 +1163,7 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
!test_bit(GLF_DEMOTE, &gl->gl_flags))
fast_path = 1;
}
- if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl) &&
- (glops->go_flags & GLOF_LRU))
+ if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl))
gfs2_glock_add_to_lru(gl);
trace_gfs2_glock_queue(gh, 0);
@@ -1456,6 +1459,7 @@ __acquires(&lru_lock)
if (!spin_trylock(&gl->gl_lockref.lock)) {
add_back_to_lru:
list_add(&gl->gl_lru, &lru_list);
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
continue;
}
@@ -1463,7 +1467,6 @@ __acquires(&lru_lock)
spin_unlock(&gl->gl_lockref.lock);
goto add_back_to_lru;
}
- clear_bit(GLF_LRU, &gl->gl_flags);
gl->gl_lockref.count++;
if (demote_ok(gl))
handle_callback(gl, LM_ST_UNLOCKED, 0, false);
@@ -1498,6 +1501,7 @@ static long gfs2_scan_glock_lru(int nr)
if (!test_bit(GLF_LOCK, &gl->gl_flags)) {
list_move(&gl->gl_lru, &dispose);
atomic_dec(&lru_count);
+ clear_bit(GLF_LRU, &gl->gl_flags);
freed++;
continue;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [Cluster-devel] [PATCH AUTOSEL 5.0 005/317] gfs2: fix race between gfs2_freeze_func and unmount
2019-05-22 19:18 [Cluster-devel] [PATCH AUTOSEL 5.0 001/317] gfs2: Fix lru_count going negative Sasha Levin
@ 2019-05-22 19:18 ` Sasha Levin
2019-05-22 19:18 ` [Cluster-devel] [PATCH AUTOSEL 5.0 007/317] gfs2: Fix occasional glock use-after-free Sasha Levin
1 sibling, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2019-05-22 19:18 UTC (permalink / raw)
To: cluster-devel.redhat.com
From: Abhi Das <adas@redhat.com>
[ Upstream commit 8f91821990fd6f170a5dca79697a441181a41b16 ]
As part of the freeze operation, gfs2_freeze_func() is left blocking
on a request to hold the sd_freeze_gl in SH. This glock is held in EX
by the gfs2_freeze() code.
A subsequent call to gfs2_unfreeze() releases the EXclusively held
sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in SH and
resume its operation.
gfs2_unfreeze(), however, doesn't wait for gfs2_freeze_func() to complete.
If a umount is issued right after unfreeze, it could result in an
inconsistent filesystem because some journal data (statfs update) isn't
written out.
Refer to commit 24972557b12c for a more detailed explanation of how
freeze/unfreeze work.
This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to
complete before returning to the user.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/gfs2/incore.h | 1 +
fs/gfs2/super.c | 8 +++++---
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index e10e0b0a7cd58..e1a33d2881213 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -621,6 +621,7 @@ enum {
SDF_SKIP_DLM_UNLOCK = 8,
SDF_FORCE_AIL_FLUSH = 9,
SDF_AIL1_IO_ERROR = 10,
+ SDF_FS_FROZEN = 11,
};
enum gfs2_freeze_state {
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ca71163ff7cfd..360206704a14c 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -973,8 +973,7 @@ void gfs2_freeze_func(struct work_struct *work)
if (error) {
printk(KERN_INFO "GFS2: couldn't get freeze lock : %d\n", error);
gfs2_assert_withdraw(sdp, 0);
- }
- else {
+ } else {
atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
error = thaw_super(sb);
if (error) {
@@ -987,6 +986,8 @@ void gfs2_freeze_func(struct work_struct *work)
gfs2_glock_dq_uninit(&freeze_gh);
}
deactivate_super(sb);
+ clear_bit_unlock(SDF_FS_FROZEN, &sdp->sd_flags);
+ wake_up_bit(&sdp->sd_flags, SDF_FS_FROZEN);
return;
}
@@ -1029,6 +1030,7 @@ static int gfs2_freeze(struct super_block *sb)
msleep(1000);
}
error = 0;
+ set_bit(SDF_FS_FROZEN, &sdp->sd_flags);
out:
mutex_unlock(&sdp->sd_freeze_mutex);
return error;
@@ -1053,7 +1055,7 @@ static int gfs2_unfreeze(struct super_block *sb)
gfs2_glock_dq_uninit(&sdp->sd_freeze_gh);
mutex_unlock(&sdp->sd_freeze_mutex);
- return 0;
+ return wait_on_bit(&sdp->sd_flags, SDF_FS_FROZEN, TASK_INTERRUPTIBLE);
}
/**
--
2.20.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [Cluster-devel] [PATCH AUTOSEL 5.0 007/317] gfs2: Fix occasional glock use-after-free
2019-05-22 19:18 [Cluster-devel] [PATCH AUTOSEL 5.0 001/317] gfs2: Fix lru_count going negative Sasha Levin
2019-05-22 19:18 ` [Cluster-devel] [PATCH AUTOSEL 5.0 005/317] gfs2: fix race between gfs2_freeze_func and unmount Sasha Levin
@ 2019-05-22 19:18 ` Sasha Levin
1 sibling, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2019-05-22 19:18 UTC (permalink / raw)
To: cluster-devel.redhat.com
From: Andreas Gruenbacher <agruenba@redhat.com>
[ Upstream commit 9287c6452d2b1f24ea8e84bd3cf6f3c6f267f712 ]
This patch has to do with the life cycle of glocks and buffers. When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list. Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.
So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke. At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node. This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.
Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list. The
evict decremented the glock reference count to zero, which allowed the
glock to be freed. After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.
This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/gfs2/glock.c | 1 +
fs/gfs2/log.c | 3 ++-
fs/gfs2/lops.c | 6 ++++--
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 2d25d89e77f9b..c925e9ec68f44 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -140,6 +140,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
{
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+ BUG_ON(atomic_read(&gl->gl_revokes));
rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
smp_mb();
wake_up_glock(gl);
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index b8830fda51e8f..0e04f87a7dddb 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -606,7 +606,8 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
gfs2_remove_from_ail(bd); /* drops ref on bh */
bd->bd_bh = NULL;
sdp->sd_log_num_revoke++;
- atomic_inc(&gl->gl_revokes);
+ if (atomic_inc_return(&gl->gl_revokes) == 1)
+ gfs2_glock_hold(gl);
set_bit(GLF_LFLUSH, &gl->gl_flags);
list_add(&bd->bd_list, &sdp->sd_log_le_revoke);
}
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 2295042bc6259..f09cd5d8ac631 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -667,8 +667,10 @@ static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
bd = list_entry(head->next, struct gfs2_bufdata, bd_list);
list_del_init(&bd->bd_list);
gl = bd->bd_gl;
- atomic_dec(&gl->gl_revokes);
- clear_bit(GLF_LFLUSH, &gl->gl_flags);
+ if (atomic_dec_return(&gl->gl_revokes) == 0) {
+ clear_bit(GLF_LFLUSH, &gl->gl_flags);
+ gfs2_glock_queue_put(gl);
+ }
kmem_cache_free(gfs2_bufdata_cachep, bd);
}
}
--
2.20.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-05-22 19:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-05-22 19:18 [Cluster-devel] [PATCH AUTOSEL 5.0 001/317] gfs2: Fix lru_count going negative Sasha Levin
2019-05-22 19:18 ` [Cluster-devel] [PATCH AUTOSEL 5.0 005/317] gfs2: fix race between gfs2_freeze_func and unmount Sasha Levin
2019-05-22 19:18 ` [Cluster-devel] [PATCH AUTOSEL 5.0 007/317] gfs2: Fix occasional glock use-after-free Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).