cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH] gfs2: Don't free rgrp clone bitmaps until go_inval
@ 2023-04-25 16:25 Bob Peterson
  0 siblings, 0 replies; 4+ messages in thread
From: Bob Peterson @ 2023-04-25 16:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, every time an rgrp was synced (go_sync) the
clone bitmaps were freed. We do not need to free the bitmaps in many
common cases. For example when demoting the glock from EXCLUSIVE to
SHARED. This is especially wasteful in cases where we unlink lots of
files: the rgrps are transitioned to EX, then back to SH multiple
times as it looks at the dinode allocation states, then frees them,
but the clones prevent allocations until the files are evicted.
Subsequent uses often cause the rgrp glock to be transitioned from
SH to EX and back again in rapid succession.

In these cases it's proper to sync the rgrp bitmaps to the storage media
but wasteful to free the clones, because the very next unlink needs to
reallocate the clone bitmaps again. So in short, today we have:

1. SH->EX (for unlink or other)
2. Allocate (kmalloc) a clone bitmap.
3. Clear the bits in original bitmap.
4. Keep original state in the clone bitmap to prevent re-allocation
   until the last user closes the file.
5. EX->SH
6. Sync bitmap to storage media.
7. Free the clone bitmaps.
8. Go to 1.

This repeated kmalloc -> kfree -> kmalloc -> kfree is a waste of time:
We only need to free the clone bitmaps when the glock is invalidated
(i.e. when transitioning the glock to UN or DF so another node's view
is consistent.) However, we still need to re-sync the clones with the
real bitmap. This patch allows rgrp bitmaps to stick around until we
have an invalidate of the glock. So in short:

1. SH->EX (for unlink or other)
2. Only the first time, allocate (kmalloc) a clone bitmap.
3. Free the bits in original bitmap.
4. Keep original state in the clone bitmap to prevent re-allocation
   until the last user closes the file.
5. EX->SH
6. Sync bitmap to storage media.
7. Go to 1.

Other transitions, like EX->UN still sync and free the clone bitmaps.
And, of course, transition from SH->EX cannot have dirty buffers, so
will not have clone bitmaps.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c |  5 ++++-
 fs/gfs2/rgrp.c  | 13 +++++++++++++
 fs/gfs2/rgrp.h  |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index ef31218060aa..7c124c57d268 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -205,7 +205,10 @@ static int rgrp_go_sync(struct gfs2_glock *gl)
 	error = gfs2_rgrp_metasync(gl);
 	if (!error)
 		error = gfs2_ail_empty_gl(gl);
-	gfs2_free_clones(rgd);
+	if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
+		gfs2_free_clones(rgd);
+	else
+		gfs2_sync_clones(rgd);
 	return error;
 }
 
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 3b9b76e980ad..6e212e0eb74e 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -616,6 +616,19 @@ void gfs2_free_clones(struct gfs2_rgrpd *rgd)
 	}
 }
 
+void gfs2_sync_clones(struct gfs2_rgrpd *rgd)
+{
+	int x;
+
+	for (x = 0; x < rgd->rd_length; x++) {
+		struct gfs2_bitmap *bi = rgd->rd_bits + x;
+		if (bi->bi_clone)
+			memcpy(bi->bi_clone + bi->bi_offset,
+			       bi->bi_bh->b_data + bi->bi_offset,
+			       bi->bi_bytes);
+	}
+}
+
 static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs,
 		    const char *fs_id_buf)
 {
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 00b30cf893af..254188cf2d7b 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -32,6 +32,7 @@ extern void gfs2_clear_rgrpd(struct gfs2_sbd *sdp);
 extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
 extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
 extern int gfs2_rgrp_go_instantiate(struct gfs2_glock *gl);
+extern void gfs2_sync_clones(struct gfs2_rgrpd *rgd);
 extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
 
 extern struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Don't free rgrp clone bitmaps until go_inval
@ 2023-05-10 19:08 Bob Peterson
  2023-05-11  8:47 ` Steven Whitehouse
  0 siblings, 1 reply; 4+ messages in thread
From: Bob Peterson @ 2023-05-10 19:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, every time an rgrp was synced (go_sync) the
clone bitmaps were freed. We do not need to free the bitmaps in many
common cases. For example when demoting the glock from EXCLUSIVE to
SHARED. This is especially wasteful in cases where we unlink lots of
files: the rgrps are transitioned to EX, then back to SH multiple
times as it looks at the dinode allocation states, then frees them,
but the clones prevent allocations until the files are evicted.
Subsequent uses often cause the rgrp glock to be transitioned from
SH to EX and back again in rapid succession.

In these cases it's proper to sync the rgrp bitmaps to the storage media
but wasteful to free the clones, because the very next unlink needs to
reallocate the clone bitmaps again. So in short, today we have:

1. SH->EX (for unlink or other)
2. Allocate (kmalloc) a clone bitmap.
3. Clear the bits in original bitmap.
4. Keep original state in the clone bitmap to prevent re-allocation
   until the last user closes the file.
5. EX->SH
6. Sync bitmap to storage media.
7. Free the clone bitmaps.
8. Go to 1.

This repeated kmalloc -> kfree -> kmalloc -> kfree is a waste of time:
We only need to free the clone bitmaps when the glock is invalidated
(i.e. when transitioning the glock to UN or DF so another node's view
is consistent.) However, we still need to re-sync the clones with the
real bitmap. This patch allows rgrp bitmaps to stick around until we
have an invalidate of the glock. So in short:

1. SH->EX (for unlink or other)
2. Only the first time, allocate (kmalloc) a clone bitmap.
3. Free the bits in original bitmap.
4. Keep original state in the clone bitmap to prevent re-allocation
   until the last user closes the file.
5. EX->SH
6. Sync bitmap to storage media.
7. Go to 1.

Other transitions, like EX->UN still sync and free the clone bitmaps.
And, of course, transition from SH->EX cannot have dirty buffers, so
will not have clone bitmaps.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c |  4 +++-
 fs/gfs2/rgrp.c  | 13 +++++++++++++
 fs/gfs2/rgrp.h  |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 01d433ed6ce7..58cf2004548e 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -205,7 +205,8 @@ static int rgrp_go_sync(struct gfs2_glock *gl)
 	error = gfs2_rgrp_metasync(gl);
 	if (!error)
 		error = gfs2_ail_empty_gl(gl);
-	gfs2_free_clones(rgd);
+	if (!test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
+		gfs2_sync_clones(rgd);
 	return error;
 }
 
@@ -229,6 +230,7 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 
 	if (!rgd)
 		return;
+	gfs2_free_clones(rgd);
 	start = (rgd->rd_addr * bsize) & PAGE_MASK;
 	end = PAGE_ALIGN((rgd->rd_addr + rgd->rd_length) * bsize) - 1;
 	gfs2_rgrp_brelse(rgd);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 3b9b76e980ad..6e212e0eb74e 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -616,6 +616,19 @@ void gfs2_free_clones(struct gfs2_rgrpd *rgd)
 	}
 }
 
+void gfs2_sync_clones(struct gfs2_rgrpd *rgd)
+{
+	int x;
+
+	for (x = 0; x < rgd->rd_length; x++) {
+		struct gfs2_bitmap *bi = rgd->rd_bits + x;
+		if (bi->bi_clone)
+			memcpy(bi->bi_clone + bi->bi_offset,
+			       bi->bi_bh->b_data + bi->bi_offset,
+			       bi->bi_bytes);
+	}
+}
+
 static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs,
 		    const char *fs_id_buf)
 {
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 00b30cf893af..254188cf2d7b 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -32,6 +32,7 @@ extern void gfs2_clear_rgrpd(struct gfs2_sbd *sdp);
 extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
 extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
 extern int gfs2_rgrp_go_instantiate(struct gfs2_glock *gl);
+extern void gfs2_sync_clones(struct gfs2_rgrpd *rgd);
 extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
 
 extern struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Don't free rgrp clone bitmaps until go_inval
  2023-05-10 19:08 Bob Peterson
@ 2023-05-11  8:47 ` Steven Whitehouse
  2023-05-11 14:18   ` Bob Peterson
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Whitehouse @ 2023-05-11  8:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On Wed, 2023-05-10 at 15:08 -0400, Bob Peterson wrote:
> Before this patch, every time an rgrp was synced (go_sync) the
> clone bitmaps were freed. We do not need to free the bitmaps in many
> common cases. For example when demoting the glock from EXCLUSIVE to
> SHARED. This is especially wasteful in cases where we unlink lots of
> files: the rgrps are transitioned to EX, then back to SH multiple
> times as it looks at the dinode allocation states, then frees them,
> but the clones prevent allocations until the files are evicted.
> Subsequent uses often cause the rgrp glock to be transitioned from
> SH to EX and back again in rapid succession.
> 
> In these cases it's proper to sync the rgrp bitmaps to the storage
> media
> but wasteful to free the clones, because the very next unlink needs
> to
> reallocate the clone bitmaps again. So in short, today we have:
> 
> 1. SH->EX (for unlink or other)
> 2. Allocate (kmalloc) a clone bitmap.
> 3. Clear the bits in original bitmap.
> 4. Keep original state in the clone bitmap to prevent re-allocation
> ?? until the last user closes the file.
> 5. EX->SH
> 6. Sync bitmap to storage media.
> 7. Free the clone bitmaps.
> 8. Go to 1.
> 
> This repeated kmalloc -> kfree -> kmalloc -> kfree is a waste of
> time:

It is a waste of time. However, if the clones are kept around for lots
of rgrps, then that is a waste of space. The question is really what
the correct balance is.

Can we not solve the problem at source and prevent the large number of
lock transitions referred to above? If not then it might be a good plan
to document why that is the case,

Steve.

> We only need to free the clone bitmaps when the glock is invalidated
> (i.e. when transitioning the glock to UN or DF so another node's view
> is consistent.) However, we still need to re-sync the clones with the
> real bitmap. This patch allows rgrp bitmaps to stick around until we
> have an invalidate of the glock. So in short:
> 
> 1. SH->EX (for unlink or other)
> 2. Only the first time, allocate (kmalloc) a clone bitmap.
> 3. Free the bits in original bitmap.
> 4. Keep original state in the clone bitmap to prevent re-allocation
> ?? until the last user closes the file.
> 5. EX->SH
> 6. Sync bitmap to storage media.
> 7. Go to 1.
> 
> Other transitions, like EX->UN still sync and free the clone bitmaps.
> And, of course, transition from SH->EX cannot have dirty buffers, so
> will not have clone bitmaps.
> 
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
> ?fs/gfs2/glops.c |? 4 +++-
> ?fs/gfs2/rgrp.c? | 13 +++++++++++++
> ?fs/gfs2/rgrp.h? |? 1 +
> ?3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 01d433ed6ce7..58cf2004548e 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -205,7 +205,8 @@ static int rgrp_go_sync(struct gfs2_glock *gl)
> ????????error = gfs2_rgrp_metasync(gl);
> ????????if (!error)
> ????????????????error = gfs2_ail_empty_gl(gl);
> -???????gfs2_free_clones(rgd);
> +???????if (!test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
> +???????????????gfs2_sync_clones(rgd);
> ????????return error;
> ?}
> ?
> @@ -229,6 +230,7 @@ static void rgrp_go_inval(struct gfs2_glock *gl,
> int flags)
> ?
> ????????if (!rgd)
> ????????????????return;
> +???????gfs2_free_clones(rgd);
> ????????start = (rgd->rd_addr * bsize) & PAGE_MASK;
> ????????end = PAGE_ALIGN((rgd->rd_addr + rgd->rd_length) * bsize) -
> 1;
> ????????gfs2_rgrp_brelse(rgd);
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 3b9b76e980ad..6e212e0eb74e 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -616,6 +616,19 @@ void gfs2_free_clones(struct gfs2_rgrpd *rgd)
> ????????}
> ?}
> ?
> +void gfs2_sync_clones(struct gfs2_rgrpd *rgd)
> +{
> +???????int x;
> +
> +???????for (x = 0; x < rgd->rd_length; x++) {
> +???????????????struct gfs2_bitmap *bi = rgd->rd_bits + x;
> +???????????????if (bi->bi_clone)
> +???????????????????????memcpy(bi->bi_clone + bi->bi_offset,
> +????????????????????????????? bi->bi_bh->b_data + bi->bi_offset,
> +????????????????????????????? bi->bi_bytes);
> +???????}
> +}
> +
> ?static void dump_rs(struct seq_file *seq, const struct
> gfs2_blkreserv *rs,
> ??????????????????? const char *fs_id_buf)
> ?{
> diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
> index 00b30cf893af..254188cf2d7b 100644
> --- a/fs/gfs2/rgrp.h
> +++ b/fs/gfs2/rgrp.h
> @@ -32,6 +32,7 @@ extern void gfs2_clear_rgrpd(struct gfs2_sbd *sdp);
> ?extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
> ?extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
> ?extern int gfs2_rgrp_go_instantiate(struct gfs2_glock *gl);
> +extern void gfs2_sync_clones(struct gfs2_rgrpd *rgd);
> ?extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
> ?
> ?extern struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Don't free rgrp clone bitmaps until go_inval
  2023-05-11  8:47 ` Steven Whitehouse
@ 2023-05-11 14:18   ` Bob Peterson
  0 siblings, 0 replies; 4+ messages in thread
From: Bob Peterson @ 2023-05-11 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On 5/11/23 4:47 AM, Steven Whitehouse wrote:
> Hi,
> 
>> This repeated kmalloc -> kfree -> kmalloc -> kfree is a waste of
>> time:
> 
> It is a waste of time. However, if the clones are kept around for lots
> of rgrps, then that is a waste of space. The question is really what
> the correct balance is.
> 
> Can we not solve the problem at source and prevent the large number of
> lock transitions referred to above? If not then it might be a good plan
> to document why that is the case,
> 
> Steve.

I do believe from a performance perspective, we might benefit just as 
much (if not more) by, for example, using the LM_FLAG_ANY flag when 
acquiring rgrp glocks in SH mode such as gfs2_check_blk_type.
That way if the lock is already in EX, we can just use it rather than 
demote and promote it. There may be other places in the code where we 
can do the same.

 From a correctness perspective, one of my concerns is:
The go_inval() code is only run when transitioning to UN or DF, so 
transitions from EX to SH and back won't call go_inval. They always call 
go_sync, but rgrp_go_sync code exits (and avoids the whole bitmap issue) 
unless the glock is GLF_DIRTY. It just seems fraught with pitfalls. But 
still, it seems to work properly.

I had a meeting with Andreas this morning and we decided that since it 
seems to work, "if it's not broken, don't fix it." So for now, I retract 
the patch and we can readdress the issue if we find problems related to it.

Regards,

Bob Peterson


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-11 14:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-25 16:25 [Cluster-devel] [PATCH] gfs2: Don't free rgrp clone bitmaps until go_inval Bob Peterson
  -- strict thread matches above, loose matches on Subject: below --
2023-05-10 19:08 Bob Peterson
2023-05-11  8:47 ` Steven Whitehouse
2023-05-11 14:18   ` Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).