[Cluster-devel] [GFS2] flush the log if a transaction can't allocate space

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [GFS2] flush the log if a transaction can't allocate space
@ 2007-03-23 20:51 Benjamin Marzinski
  2007-03-26  8:32 ` Steven Whitehouse
  0 siblings, 1 reply; 2+ messages in thread
From: Benjamin Marzinski @ 2007-03-23 20:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This is a fix for bz #208514. When GFS2 frees up space, the freed blocks
aren't available for reuse until the resource group is successfully written
to the ondisk journal. So in rare cases, GFS2 operations will fail, saying
that the filesystem is out of space, when in reality, you are just waiting for
a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb
to a file, and then truncate it, after a hundred interations, the write will
fail with -ENOSPC, even though the filesystem is just 1% full.

The attached patch calls a log flush in these cases.  I tested this patch
fairly heavily to check if there were any locking issues that I missed, and
it seems to work just fine. Also, this patch only does the log flush if
get_local_rgrp makes a complete loop of resource groups without skipping
any do to locking issues. The code would be slightly simpler if it just always
did the log flush after the first failed pass, and you could only ever have
to go through the loop twice, instead of up to three times. However, I guessed
that failing to find a rg simply do to locking issues would be common enough
to skip the log flush in that case, but I'm not certain that this is the right
way to go. Either way, I don't suppose this code will be hit all that often.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>

-------------- next part --------------
diff -urpN gfs2-2.6-nmw-new-clean/fs/gfs2/rgrp.c gfs2-2.6-nmw-patched/fs/gfs2/rgrp.c
--- gfs2-2.6-nmw-new-clean/fs/gfs2/rgrp.c	2007-03-19 17:23:35.000000000 -0500
+++ gfs2-2.6-nmw-patched/fs/gfs2/rgrp.c	2007-03-22 12:43:39.000000000 -0500
@@ -27,6 +27,7 @@
 #include "trans.h"
 #include "ops_file.h"
 #include "util.h"
+#include "log.h"

 #define BFITNOENT ((u32)~0)

@@ -941,9 +942,13 @@ static int get_local_rgrp(struct gfs2_in
 			rgd = gfs2_rgrpd_get_first(sdp);

 		if (rgd == begin) {
-			if (++loops >= 2 || !skipped)
+			if (++loops >= 3)
 				return -ENOSPC;
+			if (!skipped)
+				loops++;
 			flags = 0;
+			if (loops == 2)
+				gfs2_log_flush(sdp, NULL);
 		}
 	}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Cluster-devel] [GFS2] flush the log if a transaction can't allocate space
  2007-03-23 20:51 [Cluster-devel] [GFS2] flush the log if a transaction can't allocate space Benjamin Marzinski
@ 2007-03-26  8:32 ` Steven Whitehouse
  0 siblings, 0 replies; 2+ messages in thread
From: Steven Whitehouse @ 2007-03-26  8:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Now applied to the -nmw git tree. Thanks,

Steve.

On Fri, 2007-03-23 at 14:51 -0600, Benjamin Marzinski wrote:
> This is a fix for bz #208514. When GFS2 frees up space, the freed blocks
> aren't available for reuse until the resource group is successfully written
> to the ondisk journal. So in rare cases, GFS2 operations will fail, saying
> that the filesystem is out of space, when in reality, you are just waiting for
> a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb
> to a file, and then truncate it, after a hundred interations, the write will
> fail with -ENOSPC, even though the filesystem is just 1% full.
> 
> The attached patch calls a log flush in these cases.  I tested this patch
> fairly heavily to check if there were any locking issues that I missed, and
> it seems to work just fine. Also, this patch only does the log flush if
> get_local_rgrp makes a complete loop of resource groups without skipping
> any do to locking issues. The code would be slightly simpler if it just always
> did the log flush after the first failed pass, and you could only ever have
> to go through the loop twice, instead of up to three times. However, I guessed
> that failing to find a rg simply do to locking issues would be common enough
> to skip the log flush in that case, but I'm not certain that this is the right
> way to go. Either way, I don't suppose this code will be hit all that often.
> 
> Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
> 
> plain text document attachment (flush_if_full.patch)
> diff -urpN gfs2-2.6-nmw-new-clean/fs/gfs2/rgrp.c gfs2-2.6-nmw-patched/fs/gfs2/rgrp.c
> --- gfs2-2.6-nmw-new-clean/fs/gfs2/rgrp.c	2007-03-19 17:23:35.000000000 -0500
> +++ gfs2-2.6-nmw-patched/fs/gfs2/rgrp.c	2007-03-22 12:43:39.000000000 -0500
> @@ -27,6 +27,7 @@
>  #include "trans.h"
>  #include "ops_file.h"
>  #include "util.h"
> +#include "log.h"
>  
>  #define BFITNOENT ((u32)~0)
>  
> @@ -941,9 +942,13 @@ static int get_local_rgrp(struct gfs2_in
>  			rgd = gfs2_rgrpd_get_first(sdp);
>  
>  		if (rgd == begin) {
> -			if (++loops >= 2 || !skipped)
> +			if (++loops >= 3)
>  				return -ENOSPC;
> +			if (!skipped)
> +				loops++;
>  			flags = 0;
> +			if (loops == 2)
> +				gfs2_log_flush(sdp, NULL);
>  		}
>  	}
>  



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-03-26  8:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-23 20:51 [Cluster-devel] [GFS2] flush the log if a transaction can't allocate space Benjamin Marzinski
2007-03-26  8:32 ` Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).