public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Infinite loop in xfssyncd on full file system
@ 2006-08-22 20:01 Stephane Doyon
  2006-08-23  4:02 ` David Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Stephane Doyon @ 2006-08-22 20:01 UTC (permalink / raw)
  To: linux-xfs

I'm seeing what appears to be an infinite loop in xfssyncd. It is 
triggered when writing to a file system that is full or nearly full. I 
have pinpointed the change that introduced this problem: it's

     "TAKE 947395 - Fixing potential deadlock in space allocation and
     freeing due to ENOSPC"

git commit d210a28cd851082cec9b282443f8cc0e6fc09830.

I first saw the problem with a 2.6.17 kernel patched to add the 2.6.18-rc* 
XFS changes. I later confirmed that 2.6.17 does not exhibit this behavior, 
while addding just that one commit brings the problem back.

In the simplest case, I had a 7.5GB test file system, created with no 
mkfs.xfs option and mounted with no option. I filled it up, leaving half a 
GB free, simply using dd (single-threaded). Then I did

     while [ 1 ]; do dd if=/dev/zero of=f bs=1M; done
or
     i=1; while [ 1 ]; do echo $i; dd if=/dev/zero of=f$i bs=1M; \
                          i=$(($i+1)); done

and after very few iterations, my dd got stuck in uninterruptible 
sleep and I soon got: "BUG: soft lockup detected on CPU#1!" with xfssyncd 
at the bottom of the backtrace.

I took a few backtraces using KDB, letting it run a bit between taking 
each backtrace. All backtraces I saw had xfssyncd doing:

xfssyncd xfs_flush_inode_work filemap_flush __filemap_fdatawrite_range
do_writepages xfs_vm_writepage xfs_page_state_convert xfs_map_blocks
xfs_bmap xfs_iomap ...

then I've seen either:

xfs_iomap_write_allocate xfs_trans_reserve xfs_mod_incore_sb 
xfs_icsb_modify_counters xfs_icsb_modify_counters_int

or

xfs_iomap_write_allocate xfs_bmapi xfs_bmap_alloc xfs_bmap_btalloc 
xfs_alloc_vextent xfs_alloc_fix_freelist

or

xfs_icsb_balance_counter xfs_icsb_disable_counter

or

xfs_iomap_write_allocate xfs_trans_alloc _xfs_trans_alloc kmem_zone_zalloc

dd is doing: sys_write vfs_write do_sync_write xfs_file_aio_write 
xfs_write generic_file_buffered_write xfs_get_blocks __xfs_get_blocks 
xfs_bmap xfs_iomap xfs_iomap_write_delay xfs_flush_space xfs_flush_device 
_xfs_log_force xlog_state_sync_all schedule_timeout.

>From then on, other processes start piling up because of the held locks, 
and if I'm patient enough, something on my machine eventually eats away 
all the memory...

A similar problem was discussed here: 
http://oss.sgi.com/archives/xfs/2006-08/msg00144.html

For some reason I can't seem to find the original bug submission either in 
the list archives or in your bugzilla... I would comment that I have 
preemption disabled. AFAICT this is not a matter of spinlocks being held 
for too long. The "soft lockup" should trigger if a CPU doesn't reschedule 
for more than 10secs.

I saw the problem on two different machines, one has 8 pseudo CPUs 
(counting hyper-threading) and one has 4.

Most of my tests were done using a fast external storage array. But I also 
tried it on a 1GB file system that I made in a file on an ordinary disk 
and mounted using the loopback device. The lockup did not happen with dd 
as before, but then I umount'ed the file system and umount hung, and I got 
the same soft lockup for xfssyncd as before.

I hope you XFS experts see what might be wrong with that bug fix. It's 
ironic but for me, this (apparent) infinite loop seems much easier to hit 
than the out-of-order locking problem that the commit in question was 
supposed to fix. Let me know if I can get you any more info.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-22 20:01 Infinite loop in xfssyncd on full file system Stephane Doyon
@ 2006-08-23  4:02 ` David Chinner
  2006-08-23  4:48   ` David Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: David Chinner @ 2006-08-23  4:02 UTC (permalink / raw)
  To: Stephane Doyon; +Cc: linux-xfs, lnx1138

On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
> I'm seeing what appears to be an infinite loop in xfssyncd. It is 
> triggered when writing to a file system that is full or nearly full. I 
> have pinpointed the change that introduced this problem: it's
> 
>     "TAKE 947395 - Fixing potential deadlock in space allocation and
>     freeing due to ENOSPC"
> 
> git commit d210a28cd851082cec9b282443f8cc0e6fc09830.

Thanks for tracking that down - I've been trying to isolate a test case
for another report of this looping in xfssyncd.

[Luciano - this is the same problem we've been trying to track down.]

> I hope you XFS experts see what might be wrong with that bug fix. It's 
> ironic but for me, this (apparent) infinite loop seems much easier to hit 
> than the out-of-order locking problem that the commit in question was 
> supposed to fix. Let me know if I can get you any more info.

Now we know what patch introduces the problem, we know where to look.
Stay tuned...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-23  4:02 ` David Chinner
@ 2006-08-23  4:48   ` David Chinner
  2006-08-23 15:00     ` Stephane Doyon
  0 siblings, 1 reply; 9+ messages in thread
From: David Chinner @ 2006-08-23  4:48 UTC (permalink / raw)
  To: Stephane Doyon; +Cc: linux-xfs, lnx1138

On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
> On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
> > I'm seeing what appears to be an infinite loop in xfssyncd. It is 
> > triggered when writing to a file system that is full or nearly full. I 
> > have pinpointed the change that introduced this problem: it's
> > 
> >     "TAKE 947395 - Fixing potential deadlock in space allocation and
> >     freeing due to ENOSPC"
> > 
> > git commit d210a28cd851082cec9b282443f8cc0e6fc09830.
> 
> Thanks for tracking that down - I've been trying to isolate a test case
> for another report of this looping in xfssyncd.
> 
> [Luciano - this is the same problem we've been trying to track down.]
> 
> > I hope you XFS experts see what might be wrong with that bug fix. It's 
> > ironic but for me, this (apparent) infinite loop seems much easier to hit 
> > than the out-of-order locking problem that the commit in question was 
> > supposed to fix. Let me know if I can get you any more info.
> 
> Now we know what patch introduces the problem, we know where to look.
> Stay tuned...

I've had a quick look at the above commit. I'm not yet certain that
everything is correct in terms of the semantics laid down in the
change or that enough blocks are reserved for btree splits , but I
can see a hole in the implementation on multiprocessor machines.

Stephane/Luciano - can you test the following patch (note: compile
tested only) and see if it fixes the problem?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/xfs_mount.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c	2006-08-18 15:29:28.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c	2006-08-23 14:28:18.059385018 +1000
@@ -2108,11 +2108,11 @@ again:
 	case XFS_SBS_FDBLOCKS:
 		BUG_ON((mp->m_resblks - mp->m_resblks_avail) != 0);
 
-		lcounter = icsbp->icsb_fdblocks;
+		lcounter = icsbp->icsb_fdblocks - SET_ASIDE_BLOCKS;
 		lcounter += delta;
 		if (unlikely(lcounter < 0))
 			goto slow_path;
-		icsbp->icsb_fdblocks = lcounter;
+		icsbp->icsb_fdblocks = lcounter + SET_ASIDE_BLOCKS;
 		break;
 	default:
 		BUG();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-23  4:48   ` David Chinner
@ 2006-08-23 15:00     ` Stephane Doyon
  2006-08-23 19:10       ` Luciano Chavez
  2006-08-23 23:14       ` David Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Stephane Doyon @ 2006-08-23 15:00 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-xfs, lnx1138

On Wed, 23 Aug 2006, David Chinner wrote:

> On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
>> On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
>>> I'm seeing what appears to be an infinite loop in xfssyncd. It is
>>> triggered when writing to a file system that is full or nearly full. I
>>> have pinpointed the change that introduced this problem: it's
>>>
>>>     "TAKE 947395 - Fixing potential deadlock in space allocation and
>>>     freeing due to ENOSPC"
>>>
>>> git commit d210a28cd851082cec9b282443f8cc0e6fc09830.
>>
>> Thanks for tracking that down - I've been trying to isolate a test case
>> for another report of this looping in xfssyncd.
>>
>> [Luciano - this is the same problem we've been trying to track down.]
>>
>>> I hope you XFS experts see what might be wrong with that bug fix. It's
>>> ironic but for me, this (apparent) infinite loop seems much easier to hit
>>> than the out-of-order locking problem that the commit in question was
>>> supposed to fix. Let me know if I can get you any more info.
>>
>> Now we know what patch introduces the problem, we know where to look.
>> Stay tuned...
>
> I've had a quick look at the above commit. I'm not yet certain that
> everything is correct in terms of the semantics laid down in the
> change or that enough blocks are reserved for btree splits , but I

I actually tried, naively, to bump up SET_ASIDE_BLOCKS from 8 to 32. I 
won't claim to understand half of what's going on but I wondered whether 
that might make the problem noticeably harder to reproduce at least, but 
it had no effect ;-).

> can see a hole in the implementation on multiprocessor machines.
>
> Stephane/Luciano - can you test the following patch (note: compile
> tested only) and see if it fixes the problem?

I just tried it, unfortunately no effect. Stil went into a loop, on the 
second attempt.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-23 15:00     ` Stephane Doyon
@ 2006-08-23 19:10       ` Luciano Chavez
  2006-08-23 23:14       ` David Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Luciano Chavez @ 2006-08-23 19:10 UTC (permalink / raw)
  To: Stephane Doyon; +Cc: David Chinner, linux-xfs

On Wed, 2006-08-23 at 11:00 -0400, Stephane Doyon wrote:
> On Wed, 23 Aug 2006, David Chinner wrote:
> 
> > On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
> >> On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
> >>> I'm seeing what appears to be an infinite loop in xfssyncd. It is
> >>> triggered when writing to a file system that is full or nearly full. I
> >>> have pinpointed the change that introduced this problem: it's
> >>>
> >>>     "TAKE 947395 - Fixing potential deadlock in space allocation and
> >>>     freeing due to ENOSPC"
> >>>
> >>> git commit d210a28cd851082cec9b282443f8cc0e6fc09830.
> >>
> >> Thanks for tracking that down - I've been trying to isolate a test case
> >> for another report of this looping in xfssyncd.
> >>
> >> [Luciano - this is the same problem we've been trying to track down.]
> >>
> >>> I hope you XFS experts see what might be wrong with that bug fix. It's
> >>> ironic but for me, this (apparent) infinite loop seems much easier to hit
> >>> than the out-of-order locking problem that the commit in question was
> >>> supposed to fix. Let me know if I can get you any more info.
> >>
> >> Now we know what patch introduces the problem, we know where to look.
> >> Stay tuned...
> >
> > I've had a quick look at the above commit. I'm not yet certain that
> > everything is correct in terms of the semantics laid down in the
> > change or that enough blocks are reserved for btree splits , but I
> 
> I actually tried, naively, to bump up SET_ASIDE_BLOCKS from 8 to 32. I 
> won't claim to understand half of what's going on but I wondered whether 
> that might make the problem noticeably harder to reproduce at least, but 
> it had no effect ;-).
> 
> > can see a hole in the implementation on multiprocessor machines.
> >
> > Stephane/Luciano - can you test the following patch (note: compile
> > tested only) and see if it fixes the problem?
> 
> I just tried it, unfortunately no effect. Stil went into a loop, on the 
> second attempt.
> 

Yes, unfortunetly it had no effect here either.

> Thanks
> 
-- 
Luciano Chavez <lnx1138@us.ibm.com>
IBM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-23 15:00     ` Stephane Doyon
  2006-08-23 19:10       ` Luciano Chavez
@ 2006-08-23 23:14       ` David Chinner
  2006-08-28  7:23         ` David Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: David Chinner @ 2006-08-23 23:14 UTC (permalink / raw)
  To: Stephane Doyon, Luciano Chavez; +Cc: linux-xfs

On Wed, Aug 23, 2006 at 11:00:43AM -0400, Stephane Doyon wrote:
> On Wed, 23 Aug 2006, David Chinner wrote:
> 
> >On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
> >>On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
> >>>I'm seeing what appears to be an infinite loop in xfssyncd. It is
> >>>triggered when writing to a file system that is full or nearly full. I
> >>>have pinpointed the change that introduced this problem: it's
> >>>
> >>>    "TAKE 947395 - Fixing potential deadlock in space allocation and
> >>>    freeing due to ENOSPC"
> >>>
> >>>git commit d210a28cd851082cec9b282443f8cc0e6fc09830.

.....

> >>Now we know what patch introduces the problem, we know where to look.
> >>Stay tuned...
> >
> >I've had a quick look at the above commit. I'm not yet certain that
> >everything is correct in terms of the semantics laid down in the
> >change or that enough blocks are reserved for btree splits , but I
> 
> I actually tried, naively, to bump up SET_ASIDE_BLOCKS from 8 to 32. I 
> won't claim to understand half of what's going on but I wondered whether 
> that might make the problem noticeably harder to reproduce at least, but 
> it had no effect ;-).

That was going to be my next question. ;)

At least that rules out a small error in the block reservation decision,
so I'm going to have  analyse all the code paths the mod introduced
and work out what is going wrong.

> >Stephane/Luciano - can you test the following patch (note: compile
> >tested only) and see if it fixes the problem?
> 
> I just tried it, unfortunately no effect. Stil went into a loop, on the 
> second attempt.

On Wed, Aug 23, 2006 at 02:10:59PM -0500, Luciano Chavez wrote:
> 
> Yes, unfortunetly it had no effect here either.

Thanks for trying. I'll get back to you both when I have something new
to report.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-23 23:14       ` David Chinner
@ 2006-08-28  7:23         ` David Chinner
  2006-08-28 19:40           ` Luciano Chavez
  0 siblings, 1 reply; 9+ messages in thread
From: David Chinner @ 2006-08-28  7:23 UTC (permalink / raw)
  To: David Chinner; +Cc: Stephane Doyon, Luciano Chavez, linux-xfs

On Thu, Aug 24, 2006 at 09:14:29AM +1000, David Chinner wrote:
> On Wed, Aug 23, 2006 at 11:00:43AM -0400, Stephane Doyon wrote:
> > On Wed, 23 Aug 2006, David Chinner wrote:
> > 
> > >On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
> > >>On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
> > >>>I'm seeing what appears to be an infinite loop in xfssyncd. It is
> > >>>triggered when writing to a file system that is full or nearly full. I
> > >>>have pinpointed the change that introduced this problem: it's
> > >>>
> > >>>    "TAKE 947395 - Fixing potential deadlock in space allocation and
> > >>>    freeing due to ENOSPC"
> > >>>
> > >>>git commit d210a28cd851082cec9b282443f8cc0e6fc09830.
> 
> .....
> 
> > >>Now we know what patch introduces the problem, we know where to look.
> > >>Stay tuned...
> > >
> > >I've had a quick look at the above commit. I'm not yet certain that
> > >everything is correct in terms of the semantics laid down in the
> > >change or that enough blocks are reserved for btree splits , but I
> > 
> > I actually tried, naively, to bump up SET_ASIDE_BLOCKS from 8 to 32. I 
> > won't claim to understand half of what's going on but I wondered whether 
> > that might make the problem noticeably harder to reproduce at least, but 
> > it had no effect ;-).
> 
> That was going to be my next question. ;)
> 
> At least that rules out a small error in the block reservation decision,
> so I'm going to have  analyse all the code paths the mod introduced
> and work out what is going wrong.

You know, if you had of buumped it up just a bit higher, the problem might
have gone away. With a fielsystem that only has 8 AGs in it, if you bumped
it to 33, then problem would have disappeared....

What we have here is a small error in the block reservation code. Basically,
all the logic is correct except for one critical detail - while we need to
reserve 4 blocks for the AG freelist so a minimum allocation can succeed,
we need to reserve 4 blocks in _every AG_ so that when every AG is empty
we will fail with ENOSPC instead of trying to allocate a block when we
have an AG with less thaan 4 free blocks in it.

So, it's not 4 blocks filesystem wide we need to reserve, it's 4 blocks per AG
we need to reserve.

Stephane and Luciano, can you try the patch attæched below - it fixes the
100% repeatable test case (while [ 1 ]; dd to enospc; done) on my test
machine.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/xfs_mount.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c	2006-08-18 15:29:28.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c	2006-08-28 17:11:18.496258662 +1000
@@ -1257,10 +1257,11 @@ xfs_mod_sb(xfs_trans_t *tp, __int64_t fi
  * all delayed extents need to be actually allocated. To get around
  * this, we explicitly set aside a few blocks which will not be
  * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks to 8.
-*/
-#define SET_ASIDE_BLOCKS 8
+ * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
+ * btree requires 1 fsb, so we set the number of set-aside blocks
+ * to 4 + 4*agcount.
+ */
+#define XFS_SET_ASIDE_BLOCKS(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
 
 /*
  * xfs_mod_incore_sb_unlocked() is a utility routine common used to apply
@@ -1306,7 +1307,8 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t *
 		return 0;
 	case XFS_SBS_FDBLOCKS:
 
-		lcounter = (long long)mp->m_sb.sb_fdblocks - SET_ASIDE_BLOCKS;
+		lcounter = (long long)
+			mp->m_sb.sb_fdblocks - XFS_SET_ASIDE_BLOCKS(mp);
 		res_used = (long long)(mp->m_resblks - mp->m_resblks_avail);
 
 		if (delta > 0) {		/* Putting blocks back */
@@ -1340,7 +1342,7 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t *
 			}
 		}
 
-		mp->m_sb.sb_fdblocks = lcounter + SET_ASIDE_BLOCKS;
+		mp->m_sb.sb_fdblocks = lcounter + XFS_SET_ASIDE_BLOCKS(mp);
 		return 0;
 	case XFS_SBS_FREXTENTS:
 		lcounter = (long long)mp->m_sb.sb_frextents;
@@ -2108,11 +2110,11 @@ again:
 	case XFS_SBS_FDBLOCKS:
 		BUG_ON((mp->m_resblks - mp->m_resblks_avail) != 0);
 
-		lcounter = icsbp->icsb_fdblocks;
+		lcounter = icsbp->icsb_fdblocks - XFS_SET_ASIDE_BLOCKS(mp);
 		lcounter += delta;
 		if (unlikely(lcounter < 0))
 			goto slow_path;
-		icsbp->icsb_fdblocks = lcounter;
+		icsbp->icsb_fdblocks = lcounter + XFS_SET_ASIDE_BLOCKS(mp);
 		break;
 	default:
 		BUG();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-28  7:23         ` David Chinner
@ 2006-08-28 19:40           ` Luciano Chavez
  2006-08-29 13:25             ` Stephane Doyon
  0 siblings, 1 reply; 9+ messages in thread
From: Luciano Chavez @ 2006-08-28 19:40 UTC (permalink / raw)
  To: David Chinner; +Cc: Stephane Doyon, linux-xfs

On Mon, 2006-08-28 at 17:23 +1000, David Chinner wrote:
> On Thu, Aug 24, 2006 at 09:14:29AM +1000, David Chinner wrote:
> > On Wed, Aug 23, 2006 at 11:00:43AM -0400, Stephane Doyon wrote:
> > > On Wed, 23 Aug 2006, David Chinner wrote:
> > > 
> > > >On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
> > > >>On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
> > > >>>I'm seeing what appears to be an infinite loop in xfssyncd. It is
> > > >>>triggered when writing to a file system that is full or nearly full. I
> > > >>>have pinpointed the change that introduced this problem: it's
> > > >>>
> > > >>>    "TAKE 947395 - Fixing potential deadlock in space allocation and
> > > >>>    freeing due to ENOSPC"
> > > >>>
> > > >>>git commit d210a28cd851082cec9b282443f8cc0e6fc09830.
> > 
> > .....
> > 
> > > >>Now we know what patch introduces the problem, we know where to look.
> > > >>Stay tuned...
> > > >
> > > >I've had a quick look at the above commit. I'm not yet certain that
> > > >everything is correct in terms of the semantics laid down in the
> > > >change or that enough blocks are reserved for btree splits , but I
> > > 
> > > I actually tried, naively, to bump up SET_ASIDE_BLOCKS from 8 to 32. I 
> > > won't claim to understand half of what's going on but I wondered whether 
> > > that might make the problem noticeably harder to reproduce at least, but 
> > > it had no effect ;-).
> > 
> > That was going to be my next question. ;)
> > 
> > At least that rules out a small error in the block reservation decision,
> > so I'm going to have  analyse all the code paths the mod introduced
> > and work out what is going wrong.
> 
> You know, if you had of buumped it up just a bit higher, the problem might
> have gone away. With a fielsystem that only has 8 AGs in it, if you bumped
> it to 33, then problem would have disappeared....
> 
> What we have here is a small error in the block reservation code. Basically,
> all the logic is correct except for one critical detail - while we need to
> reserve 4 blocks for the AG freelist so a minimum allocation can succeed,
> we need to reserve 4 blocks in _every AG_ so that when every AG is empty
> we will fail with ENOSPC instead of trying to allocate a block when we
> have an AG with less thaan 4 free blocks in it.
> 
> So, it's not 4 blocks filesystem wide we need to reserve, it's 4 blocks per AG
> we need to reserve.
> 
> Stephane and Luciano, can you try the patch attæched below - it fixes the
> 100% repeatable test case (while [ 1 ]; dd to enospc; done) on my test
> machine.
> 

Dave,

The latest patch seems to work for me running bonnie++ on a small 2GB
XFS filesystem. bonnie++ gets an ENOSPC on a write() and ends plus I
don't see the softwatchdog timer dump the kernel stack or xfssyncd
looping. Thanks!

Can you keep me posted when your patch is included in your CVS please? 

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
> 
> 
> ---
>  fs/xfs/xfs_mount.c |   18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c	2006-08-18 15:29:28.000000000 +1000
> +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c	2006-08-28 17:11:18.496258662 +1000
> @@ -1257,10 +1257,11 @@ xfs_mod_sb(xfs_trans_t *tp, __int64_t fi
>   * all delayed extents need to be actually allocated. To get around
>   * this, we explicitly set aside a few blocks which will not be
>   * reserved in delayed allocation. Considering the minimum number of
> - * needed freelist blocks is 4 fsbs, a potential split of file's bmap
> - * btree requires 1 fsb, so we set the number of set-aside blocks to 8.
> -*/
> -#define SET_ASIDE_BLOCKS 8
> + * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
> + * btree requires 1 fsb, so we set the number of set-aside blocks
> + * to 4 + 4*agcount.
> + */
> +#define XFS_SET_ASIDE_BLOCKS(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
>  
>  /*
>   * xfs_mod_incore_sb_unlocked() is a utility routine common used to apply
> @@ -1306,7 +1307,8 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t *
>  		return 0;
>  	case XFS_SBS_FDBLOCKS:
>  
> -		lcounter = (long long)mp->m_sb.sb_fdblocks - SET_ASIDE_BLOCKS;
> +		lcounter = (long long)
> +			mp->m_sb.sb_fdblocks - XFS_SET_ASIDE_BLOCKS(mp);
>  		res_used = (long long)(mp->m_resblks - mp->m_resblks_avail);
>  
>  		if (delta > 0) {		/* Putting blocks back */
> @@ -1340,7 +1342,7 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t *
>  			}
>  		}
>  
> -		mp->m_sb.sb_fdblocks = lcounter + SET_ASIDE_BLOCKS;
> +		mp->m_sb.sb_fdblocks = lcounter + XFS_SET_ASIDE_BLOCKS(mp);
>  		return 0;
>  	case XFS_SBS_FREXTENTS:
>  		lcounter = (long long)mp->m_sb.sb_frextents;
> @@ -2108,11 +2110,11 @@ again:
>  	case XFS_SBS_FDBLOCKS:
>  		BUG_ON((mp->m_resblks - mp->m_resblks_avail) != 0);
>  
> -		lcounter = icsbp->icsb_fdblocks;
> +		lcounter = icsbp->icsb_fdblocks - XFS_SET_ASIDE_BLOCKS(mp);
>  		lcounter += delta;
>  		if (unlikely(lcounter < 0))
>  			goto slow_path;
> -		icsbp->icsb_fdblocks = lcounter;
> +		icsbp->icsb_fdblocks = lcounter + XFS_SET_ASIDE_BLOCKS(mp);
>  		break;
>  	default:
>  		BUG();
-- 
Luciano Chavez <lnx1138@us.ibm.com>
IBM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Infinite loop in xfssyncd on full file system
  2006-08-28 19:40           ` Luciano Chavez
@ 2006-08-29 13:25             ` Stephane Doyon
  0 siblings, 0 replies; 9+ messages in thread
From: Stephane Doyon @ 2006-08-29 13:25 UTC (permalink / raw)
  To: David Chinner, Luciano Chavez; +Cc: linux-xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3242 bytes --]

On Mon, 28 Aug 2006, Luciano Chavez wrote:

> On Mon, 2006-08-28 at 17:23 +1000, David Chinner wrote:
>> On Thu, Aug 24, 2006 at 09:14:29AM +1000, David Chinner wrote:
>>> On Wed, Aug 23, 2006 at 11:00:43AM -0400, Stephane Doyon wrote:
>>>> On Wed, 23 Aug 2006, David Chinner wrote:
>>>>
>>>>> On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote:
>>>>>> On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote:
>>>>>>> I'm seeing what appears to be an infinite loop in xfssyncd. It is
>>>>>>> triggered when writing to a file system that is full or nearly full. I
>>>>>>> have pinpointed the change that introduced this problem: it's
>>>>>>>
>>>>>>>    "TAKE 947395 - Fixing potential deadlock in space allocation and
>>>>>>>    freeing due to ENOSPC"
>>>>>>>
>>>>>>> git commit d210a28cd851082cec9b282443f8cc0e6fc09830.
>>>
>>> .....
>>>
>>>>>> Now we know what patch introduces the problem, we know where to look.
>>>>>> Stay tuned...
>>>>>
>>>>> I've had a quick look at the above commit. I'm not yet certain that
>>>>> everything is correct in terms of the semantics laid down in the
>>>>> change or that enough blocks are reserved for btree splits , but I
>>>>
>>>> I actually tried, naively, to bump up SET_ASIDE_BLOCKS from 8 to 32. I
>>>> won't claim to understand half of what's going on but I wondered whether
>>>> that might make the problem noticeably harder to reproduce at least, but
>>>> it had no effect ;-).
>>>
>>> That was going to be my next question. ;)
>>>
>>> At least that rules out a small error in the block reservation decision,
>>> so I'm going to have  analyse all the code paths the mod introduced
>>> and work out what is going wrong.
>>
>> You know, if you had of buumped it up just a bit higher, the problem might
>> have gone away. With a fielsystem that only has 8 AGs in it, if you bumped
>> it to 33, then problem would have disappeared....
>>
>> What we have here is a small error in the block reservation code. Basically,
>> all the logic is correct except for one critical detail - while we need to
>> reserve 4 blocks for the AG freelist so a minimum allocation can succeed,
>> we need to reserve 4 blocks in _every AG_ so that when every AG is empty
>> we will fail with ENOSPC instead of trying to allocate a block when we
>> have an AG with less thaan 4 free blocks in it.
>>
>> So, it's not 4 blocks filesystem wide we need to reserve, it's 4 blocks per AG
>> we need to reserve.
>>
>> Stephane and Luciano, can you try the patch attæched below - it fixes the
>> 100% repeatable test case (while [ 1 ]; dd to enospc; done) on my test
>> machine.
>>
>
> Dave,
>
> The latest patch seems to work for me running bonnie++ on a small 2GB
> XFS filesystem. bonnie++ gets an ENOSPC on a write() and ends plus I
> don't see the softwatchdog timer dump the kernel stack or xfssyncd
> looping. Thanks!

Works here too. Tried on three different file system configurations (sizes 
and numbers of AGs). I ran only simple tests though, but at least it fixes 
the obvious test case. Ideally I'd like to run some stress test involving 
NFS service, but I won't be able to make time for that in the very near 
future.

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-08-29 13:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-22 20:01 Infinite loop in xfssyncd on full file system Stephane Doyon
2006-08-23  4:02 ` David Chinner
2006-08-23  4:48   ` David Chinner
2006-08-23 15:00     ` Stephane Doyon
2006-08-23 19:10       ` Luciano Chavez
2006-08-23 23:14       ` David Chinner
2006-08-28  7:23         ` David Chinner
2006-08-28 19:40           ` Luciano Chavez
2006-08-29 13:25             ` Stephane Doyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox