largeio mount and performance impact

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* largeio mount and performance impact
@ 2006-09-23 13:27 Sebastian Brings
  2006-09-23 16:34 ` Eric Sandeen
  2006-09-25  8:38 ` Data type overflow in xfs_trans_unreserve_and_mod_sb Shailendra Tripathi
  0 siblings, 2 replies; 10+ messages in thread
From: Sebastian Brings @ 2006-09-23 13:27 UTC (permalink / raw)
  To: xfs

In a different thread I read about the largeio mount option for XFS. Now
I wonder if the problems I recently ran in have been caused by this.

After a system uprgade from sles9 sp2 to sp3 one app started
misbehaving. Before the upgrade it used 15% CPU, after the upgrade it
was 90+% and the performance dropped by about 50%. The app is writing a
wave audio file, and for every 3840 bytes of audio samples it appends,
it updates the RIFF header of the file. All of this was done using the
buffered fopen/fwrite/... C library functions. 
An strace showed that seeking to the beginning of the file also
triggered a 12MiB read(2) call, and seeking to the end, for example to
13MiB, translated to a seek(2) to offset 1mib and a read(2) of 12 MiB.

Initiall I assumed something very strange had happened to the C lib
defaults. Otoh 12MiB is the swidth of the filesystem, so I assume that
the C lib got the 12 MiB optimal IO size from XFS and therefore behaved
as described above. Could that be?

Regards

Sebastian 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: largeio mount and performance impact
  2006-09-23 13:27 largeio mount and performance impact Sebastian Brings
@ 2006-09-23 16:34 ` Eric Sandeen
  2006-09-24 13:41   ` Sebastian Brings
  2006-09-25  8:38 ` Data type overflow in xfs_trans_unreserve_and_mod_sb Shailendra Tripathi
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2006-09-23 16:34 UTC (permalink / raw)
  To: Sebastian Brings; +Cc: xfs

Sebastian Brings wrote:
> In a different thread I read about the largeio mount option for XFS. Now
> I wonder if the problems I recently ran in have been caused by this.
> 
> After a system uprgade from sles9 sp2 to sp3 one app started
> misbehaving. Before the upgrade it used 15% CPU, after the upgrade it
> was 90+% and the performance dropped by about 50%. The app is writing a
> wave audio file, and for every 3840 bytes of audio samples it appends,
> it updates the RIFF header of the file. All of this was done using the
> buffered fopen/fwrite/... C library functions. 
> An strace showed that seeking to the beginning of the file also
> triggered a 12MiB read(2) call, and seeking to the end, for example to
> 13MiB, translated to a seek(2) to offset 1mib and a read(2) of 12 MiB.
> 
> Initiall I assumed something very strange had happened to the C lib
> defaults. Otoh 12MiB is the swidth of the filesystem, so I assume that
> the C lib got the 12 MiB optimal IO size from XFS and therefore behaved
> as described above. Could that be?

Unless you specify the largeio mount option, I don't -think- any of this is exposed.

What does stat -c %o <file> say?

Did this strace behavior change from sp2 to sp3?

-Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: largeio mount and performance impact
  2006-09-23 16:34 ` Eric Sandeen
@ 2006-09-24 13:41   ` Sebastian Brings
  2006-09-24 13:47     ` Sebastian Brings
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Brings @ 2006-09-24 13:41 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

> -----Original Message-----
> From: Eric Sandeen [mailto:sandeen@sandeen.net]
> Sent: Samstag, 23. September 2006 18:34
> To: Sebastian Brings
> Cc: xfs@oss.sgi.com
> Subject: Re: largeio mount and performance impact
> 
> Sebastian Brings wrote:
> > In a different thread I read about the largeio mount option for XFS.
Now
> > I wonder if the problems I recently ran in have been caused by this.
> >
> > After a system uprgade from sles9 sp2 to sp3 one app started
> > misbehaving. Before the upgrade it used 15% CPU, after the upgrade
it
> > was 90+% and the performance dropped by about 50%. The app is
writing a
> > wave audio file, and for every 3840 bytes of audio samples it
appends,
> > it updates the RIFF header of the file. All of this was done using
the
> > buffered fopen/fwrite/... C library functions.
> > An strace showed that seeking to the beginning of the file also
> > triggered a 12MiB read(2) call, and seeking to the end, for example
to
> > 13MiB, translated to a seek(2) to offset 1mib and a read(2) of 12
MiB.
> >
> > Initiall I assumed something very strange had happened to the C lib
> > defaults. Otoh 12MiB is the swidth of the filesystem, so I assume
that
> > the C lib got the 12 MiB optimal IO size from XFS and therefore
behaved
> > as described above. Could that be?
> 
> Unless you specify the largeio mount option, I don't -think- any of
this
> is exposed.
> 
> What does stat -c %o <file> say?
> 
> Did this strace behavior change from sp2 to sp3?
> 
> -Eric


Stat seems to report 12MB:

# stat -c %o sebastian.tgz
12582912

Mount does not have largeio set explicitly. But this is a CXFS, so this
may change tings:

/dev/cxvm/fs1 on /shares/fs1 type xfs
(rw,noatime,client_timeout=1s,retry=0,server_list=(meta01,
meta02),filestreams,quota,mtpt=/shares/fs1)


I don't have strace data from the app running under sp2. But changing
the default library buffer size to a more reasonable size after the
fopen(), the app behaved normal again.


Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: largeio mount and performance impact
  2006-09-24 13:41   ` Sebastian Brings
@ 2006-09-24 13:47     ` Sebastian Brings
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Brings @ 2006-09-24 13:47 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs



> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] On Behalf
Of
> Sebastian Brings
> Sent: Sonntag, 24. September 2006 15:42
> To: Eric Sandeen
> Cc: xfs@oss.sgi.com
> Subject: RE: largeio mount and performance impact
> 
> > -----Original Message-----
> > From: Eric Sandeen [mailto:sandeen@sandeen.net]
> > Sent: Samstag, 23. September 2006 18:34
> > To: Sebastian Brings
> > Cc: xfs@oss.sgi.com
> > Subject: Re: largeio mount and performance impact
> >
> > Sebastian Brings wrote:
> > > In a different thread I read about the largeio mount option for
XFS.
> Now
> > > I wonder if the problems I recently ran in have been caused by
this.
> > >
> > > After a system uprgade from sles9 sp2 to sp3 one app started
> > > misbehaving. Before the upgrade it used 15% CPU, after the upgrade
> it
> > > was 90+% and the performance dropped by about 50%. The app is
> writing a
> > > wave audio file, and for every 3840 bytes of audio samples it
> appends,
> > > it updates the RIFF header of the file. All of this was done using
> the
> > > buffered fopen/fwrite/... C library functions.
> > > An strace showed that seeking to the beginning of the file also
> > > triggered a 12MiB read(2) call, and seeking to the end, for
example
> to
> > > 13MiB, translated to a seek(2) to offset 1mib and a read(2) of 12
> MiB.
> > >
> > > Initiall I assumed something very strange had happened to the C
lib
> > > defaults. Otoh 12MiB is the swidth of the filesystem, so I assume
> that
> > > the C lib got the 12 MiB optimal IO size from XFS and therefore
> behaved
> > > as described above. Could that be?
> >
> > Unless you specify the largeio mount option, I don't -think- any of
> this
> > is exposed.
> >
> > What does stat -c %o <file> say?
> >
> > Did this strace behavior change from sp2 to sp3?
> >
> > -Eric
> 
> 
> Stat seems to report 12MB:
> 
> # stat -c %o sebastian.tgz
> 12582912
> 
> Mount does not have largeio set explicitly. But this is a CXFS, so
this
> may change tings:
> 
> /dev/cxvm/fs1 on /shares/fs1 type xfs
> (rw,noatime,client_timeout=1s,retry=0,server_list=(meta01,
> meta02),filestreams,quota,mtpt=/shares/fs1)
> 
> 
> I don't have strace data from the app running under sp2. But changing
> the default library buffer size to a more reasonable size after the
> fopen(), the app behaved normal again.
> 
> 
> Sebastian
> 

Now I remember. Filestreams was added during the update. Maybe that
makes the difference. Anyways, I just wanted to point out that largeio
mount option may have unexpected side effects

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Data type overflow in xfs_trans_unreserve_and_mod_sb
  2006-09-23 13:27 largeio mount and performance impact Sebastian Brings
  2006-09-23 16:34 ` Eric Sandeen
@ 2006-09-25  8:38 ` Shailendra Tripathi
  2006-09-25 14:32   ` Eric Sandeen
  2006-10-11  5:25   ` David Chinner
  1 sibling, 2 replies; 10+ messages in thread
From: Shailendra Tripathi @ 2006-09-25  8:38 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs, Timothy Shimmin

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]

Hi David,
           As part of fixing xfs_reserve_blocks issue, you might want to 
fix an issue in xfs_trans_unreserve_and_mod_sb as well. Since, I am on 
much older version, my patch is not applicable on newer trees. However, 
the patch is attached for your reference.

The problem is as below:

Superblock modifications required during transaction are stored in delta 
fields in transaction. These fields are applied to the superblock when 
transaction commits.

The in-core superblock changes are done in 
xfs_trans_unreserve_and_mod_sb. It calls xfs_mod_incore_sb_batch 
function to apply the changes. This function tries to apply the deltas 
and if it fails for any reason, it backs out all the changes. One 
typical modification done is like that:

         case XFS_SBS_DBLOCKS:
                 lcounter = (long long)mp->m_sb.sb_dblocks;
                 lcounter += delta;
                 if (lcounter < 0) {
                         ASSERT(0);
                         return (XFS_ERROR(EINVAL));
                 }
                 mp->m_sb.sb_dblocks = lcounter;
                 return (0);

So, when it returns EINVAL, the second part of the code backs out the 
changes made to superblock. However, the worst part is that 
xfs_trans_unreserve_and_mod_sb does not return any error value. The 
transaction appears to be committed peacefully without returning the 
error. You don't notice this unless you do I/O on the filesystem. Later, 
it hits some sort of in-memory corruption or other errors.

We hit this issue in our testing we tried to grow the filesystem from 
from 100GB to 10000GB. This is beyond the interger (31 bits) limit and, 
hence, for dblocks and fdblocks,  xfs_mod_sb struct does not pass in 
correct data.




[-- Attachment #2: xfs-sb-mod-growfs.patch --]
[-- Type: text/x-patch, Size: 3709 bytes --]

diff -urNp -X kernel-2.6.7-dontdiff kernel-2.6.7-orig/fs/xfs/xfs_mount.c kernel-2.6.7-modif/fs/xfs/xfs_mount.c
--- kernel-2.6.7-orig/fs/xfs/xfs_mount.c
+++ kernel-2.6.7-modif/fs/xfs/xfs_mount.c
@@ -1286,7 +1286,7 @@ xfs_mod_sb(xfs_trans_t *tp, __int64_t fi
  */
 STATIC int
 xfs_mod_incore_sb_unlocked(xfs_mount_t *mp, xfs_sb_field_t field,
-			int delta, int rsvd)
+			long delta, int rsvd)
 {
 	int		scounter;	/* short counter for 32 bit fields */
 	long long	lcounter;	/* long counter for 64 bit fields */
diff -urNp -X kernel-2.6.7-dontdiff kernel-2.6.7-orig/fs/xfs/xfs_mount.h kernel-2.6.7-modif/fs/xfs/xfs_mount.h
--- kernel-2.6.7-orig/fs/xfs/xfs_mount.h
+++ kernel-2.6.7-modif/fs/xfs/xfs_mount.h
@@ -551,10 +551,11 @@ static inline xfs_agblock_t XFS_DADDR_TO
 
 /*
  * This structure is for use by the xfs_mod_incore_sb_batch() routine.
+ * xfs_growfs can specify a few fields which are more than int limit
  */
 typedef struct xfs_mod_sb {
 	xfs_sb_field_t	msb_field;	/* Field to modify, see below */
-	int		msb_delta;	/* Change to make to specified field */
+	long		msb_delta;	/* Change to make to specified field */
 } xfs_mod_sb_t;
 
 #define	XFS_MOUNT_ILOCK(mp)	mutex_lock(&((mp)->m_ilock), PINOD)
diff -urNp -X kernel-2.6.7-dontdiff kernel-2.6.7-orig/fs/xfs/xfs_trans.c kernel-2.6.7-modif/fs/xfs/xfs_trans.c
--- kernel-2.6.7-orig/fs/xfs/xfs_trans.c
+++ kernel-2.6.7-modif/fs/xfs/xfs_trans.c
@@ -590,62 +590,62 @@ xfs_trans_unreserve_and_mod_sb(
 	if (tp->t_flags & XFS_TRANS_SB_DIRTY) {
 		if (tp->t_icount_delta != 0) {
 			msbp->msb_field = XFS_SBS_ICOUNT;
-			msbp->msb_delta = (int)tp->t_icount_delta;
+			msbp->msb_delta = tp->t_icount_delta;
 			msbp++;
 		}
 		if (tp->t_ifree_delta != 0) {
 			msbp->msb_field = XFS_SBS_IFREE;
-			msbp->msb_delta = (int)tp->t_ifree_delta;
+			msbp->msb_delta = tp->t_ifree_delta;
 			msbp++;
 		}
 		if (tp->t_fdblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_FDBLOCKS;
-			msbp->msb_delta = (int)tp->t_fdblocks_delta;
+			msbp->msb_delta = tp->t_fdblocks_delta;
 			msbp++;
 		}
 		if (tp->t_frextents_delta != 0) {
 			msbp->msb_field = XFS_SBS_FREXTENTS;
-			msbp->msb_delta = (int)tp->t_frextents_delta;
+			msbp->msb_delta = tp->t_frextents_delta;
 			msbp++;
 		}
 		if (tp->t_dblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_DBLOCKS;
-			msbp->msb_delta = (int)tp->t_dblocks_delta;
+			msbp->msb_delta = tp->t_dblocks_delta;
 			msbp++;
 		}
 		if (tp->t_agcount_delta != 0) {
 			msbp->msb_field = XFS_SBS_AGCOUNT;
-			msbp->msb_delta = (int)tp->t_agcount_delta;
+			msbp->msb_delta = tp->t_agcount_delta;
 			msbp++;
 		}
 		if (tp->t_imaxpct_delta != 0) {
 			msbp->msb_field = XFS_SBS_IMAX_PCT;
-			msbp->msb_delta = (int)tp->t_imaxpct_delta;
+			msbp->msb_delta = tp->t_imaxpct_delta;
 			msbp++;
 		}
 		if (tp->t_rextsize_delta != 0) {
 			msbp->msb_field = XFS_SBS_REXTSIZE;
-			msbp->msb_delta = (int)tp->t_rextsize_delta;
+			msbp->msb_delta = tp->t_rextsize_delta;
 			msbp++;
 		}
 		if (tp->t_rbmblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_RBMBLOCKS;
-			msbp->msb_delta = (int)tp->t_rbmblocks_delta;
+			msbp->msb_delta = tp->t_rbmblocks_delta;
 			msbp++;
 		}
 		if (tp->t_rblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_RBLOCKS;
-			msbp->msb_delta = (int)tp->t_rblocks_delta;
+			msbp->msb_delta = tp->t_rblocks_delta;
 			msbp++;
 		}
 		if (tp->t_rextents_delta != 0) {
 			msbp->msb_field = XFS_SBS_REXTENTS;
-			msbp->msb_delta = (int)tp->t_rextents_delta;
+			msbp->msb_delta = tp->t_rextents_delta;
 			msbp++;
 		}
 		if (tp->t_rextslog_delta != 0) {
 			msbp->msb_field = XFS_SBS_REXTSLOG;
-			msbp->msb_delta = (int)tp->t_rextslog_delta;
+			msbp->msb_delta = tp->t_rextslog_delta;
 			msbp++;
 		}
 	}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data type overflow in xfs_trans_unreserve_and_mod_sb
  2006-09-25  8:38 ` Data type overflow in xfs_trans_unreserve_and_mod_sb Shailendra Tripathi
@ 2006-09-25 14:32   ` Eric Sandeen
  2006-09-25 14:52     ` Shailendra Tripathi
  2006-10-11  5:25   ` David Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2006-09-25 14:32 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: David Chinner, xfs, Timothy Shimmin

Shailendra Tripathi wrote:
> Hi David,
>           As part of fixing xfs_reserve_blocks issue, you might want to 
> fix an issue in xfs_trans_unreserve_and_mod_sb as well. Since, I am on 
> much older version, my patch is not applicable on newer trees. However, 
> the patch is attached for your reference.
> 
> The problem is as below:
> 
> Superblock modifications required during transaction are stored in delta 
> fields in transaction. These fields are applied to the superblock when 
> transaction commits.
> 
> The in-core superblock changes are done in 
> xfs_trans_unreserve_and_mod_sb. It calls xfs_mod_incore_sb_batch 
> function to apply the changes. This function tries to apply the deltas 
> and if it fails for any reason, it backs out all the changes. One 
> typical modification done is like that:
> 
>         case XFS_SBS_DBLOCKS:
>                 lcounter = (long long)mp->m_sb.sb_dblocks;
>                 lcounter += delta;
>                 if (lcounter < 0) {
>                         ASSERT(0);
>                         return (XFS_ERROR(EINVAL));
>                 }
>                 mp->m_sb.sb_dblocks = lcounter;
>                 return (0);
> 
> So, when it returns EINVAL, the second part of the code backs out the 
> changes made to superblock. However, the worst part is that 
> xfs_trans_unreserve_and_mod_sb does not return any error value. 

Hm, yep, just ASSERT(error == 0);

I suppose this is the trickiness of canceling a transaction at some points...

> The 
> transaction appears to be committed peacefully without returning the 
> error. You don't notice this unless you do I/O on the filesystem. Later, 
> it hits some sort of in-memory corruption or other errors.
> 
> We hit this issue in our testing we tried to grow the filesystem from 
> from 100GB to 10000GB. This is beyond the interger (31 bits) limit and, 
> hence, for dblocks and fdblocks,  xfs_mod_sb struct does not pass in 
> correct data.
> 
> 

First thoughts, "long" won't help on 32 bit machines, perhaps this should be an 
explicitly-sized 64-bit type?

-Eric

p.s. good to see agami's recently active participation on the list!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data type overflow in xfs_trans_unreserve_and_mod_sb
  2006-09-25 14:32   ` Eric Sandeen
@ 2006-09-25 14:52     ` Shailendra Tripathi
  0 siblings, 0 replies; 10+ messages in thread
From: Shailendra Tripathi @ 2006-09-25 14:52 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: David Chinner, xfs, Timothy Shimmin

Eric Sandeen wrote:
> Hm, yep, just ASSERT(error == 0);
> 
> I suppose this is the trickiness of canceling a transaction at some 
> points...

Yes, you are right. At the point where it is being applied, all the 
things can't be reverted back as XFS does not store the "before-image". 
However, typically in most of such cases, XFS goes ahead with shutting 
down the filesystem assuming these to be catastrophic or incorrigible 
errors requiring manual intervention.

> First thoughts, "long" won't help on 32 bit machines, perhaps this 
> should be an explicitly-sized 64-bit type

> -Eric
> 
> p.s. good to see agami's recently active participation on the list!
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data type overflow in xfs_trans_unreserve_and_mod_sb
  2006-09-25  8:38 ` Data type overflow in xfs_trans_unreserve_and_mod_sb Shailendra Tripathi
  2006-09-25 14:32   ` Eric Sandeen
@ 2006-10-11  5:25   ` David Chinner
  2006-10-13  6:13     ` David Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: David Chinner @ 2006-10-11  5:25 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: sandeen, xfs, Timothy Shimmin

On Mon, Sep 25, 2006 at 02:08:11PM +0530, Shailendra Tripathi wrote:
> Hi David,
>           As part of fixing xfs_reserve_blocks issue, you might want to 
> fix an issue in xfs_trans_unreserve_and_mod_sb as well. Since, I am on 
> much older version, my patch is not applicable on newer trees. However, 
> the patch is attached for your reference.
> 
> The problem is as below:
> 
> Superblock modifications required during transaction are stored in delta 
> fields in transaction. These fields are applied to the superblock when 
> transaction commits.
> 
> The in-core superblock changes are done in 
> xfs_trans_unreserve_and_mod_sb. It calls xfs_mod_incore_sb_batch 
> function to apply the changes. This function tries to apply the deltas 
> and if it fails for any reason, it backs out all the changes. One 
> typical modification done is like that:
> 
>         case XFS_SBS_DBLOCKS:
>                 lcounter = (long long)mp->m_sb.sb_dblocks;
>                 lcounter += delta;
>                 if (lcounter < 0) {
>                         ASSERT(0);
>                         return (XFS_ERROR(EINVAL));
>                 }
>                 mp->m_sb.sb_dblocks = lcounter;
>                 return (0);
> 
> So, when it returns EINVAL, the second part of the code backs out the 
> changes made to superblock. However, the worst part is that 
> xfs_trans_unreserve_and_mod_sb does not return any error value.

That's because the error checking is supposed to occur before you
commit the transaction e.g. during xfs_trans_mod_sb() that
calculates the deltas. In which case:

        case XFS_TRANS_SB_DBLOCKS:
                ASSERT(delta > 0);
                tp->t_dblocks_delta += delta;
                break;

You should assert fail there is the delta is less than zero.  To me, it is
obvious these ASSERTS were placed long ago to be a landmine for someone to trip
over when thea deltas start to overflow.  Far-sighted, self documenting code -
just the way it should be ;)

So, looking a little deeper:

void
xfs_trans_mod_sb(
        xfs_trans_t     *tp,
        uint            field,
        long            delta)

This function can't take more than 31 bits of delta on a 32 bit machine
so your patch only fixed the problem on 64 bit platforms. Given that we can
support 16TB filesystems on 32 bit platforms, they need to be fixed in
some way here as well.

Also, the transaction delta fields are all longs - they overflow in the same
manner.

Eric, you suggested specific 64 bit types - I think that's really the
way to fix this, but it's a much bigger change...

Cheers,

Dave. 
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data type overflow in xfs_trans_unreserve_and_mod_sb
  2006-10-11  5:25   ` David Chinner
@ 2006-10-13  6:13     ` David Chinner
  2006-10-13  8:31       ` Shailendra Tripathi
  0 siblings, 1 reply; 10+ messages in thread
From: David Chinner @ 2006-10-13  6:13 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: sandeen, xfs, Timothy Shimmin

On Wed, Oct 11, 2006 at 03:25:57PM +1000, David Chinner wrote:
> On Mon, Sep 25, 2006 at 02:08:11PM +0530, Shailendra Tripathi wrote:
> > Hi David,
> >           As part of fixing xfs_reserve_blocks issue, you might want to 
> > fix an issue in xfs_trans_unreserve_and_mod_sb as well. Since, I am on 
> > much older version, my patch is not applicable on newer trees. However, 
> > the patch is attached for your reference.
> > 
> > The problem is as below:
> > 
> > Superblock modifications required during transaction are stored in delta 
> > fields in transaction. These fields are applied to the superblock when 
> > transaction commits.

.....

> So, looking a little deeper:
> 
> void
> xfs_trans_mod_sb(
>         xfs_trans_t     *tp,
>         uint            field,
>         long            delta)
> 
> This function can't take more than 31 bits of delta on a 32 bit machine
> so your patch only fixed the problem on 64 bit platforms. Given that we can
> support 16TB filesystems on 32 bit platforms, they need to be fixed in
> some way here as well.
> 
> Also, the transaction delta fields are all longs - they overflow in the same
> manner.
> 
> Eric, you suggested specific 64 bit types - I think that's really the
> way to fix this, but it's a much bigger change...

Shailendra, here's a patch that passes XFSQA that changes this all to 64 bit
types.  I've had to fix various type abuses that weren't obvious because gcc
fails to warn when you pass a uint into a function parameter that is declared
as int64_t.....

I haven't tested the >2TB grow case yet, but it should work now 
on both 32bit and 64 bit platforms with this patch.

Is there anything I missed here in the conversion?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/xfs_bmap.c  |   26 +++++++++++++-------------
 fs/xfs/xfs_mount.c |   15 +++++++--------
 fs/xfs/xfs_mount.h |    7 ++++---
 fs/xfs/xfs_trans.c |   32 ++++++++++++++++----------------
 fs/xfs/xfs_trans.h |   42 +++++++++++++++++++++---------------------
 5 files changed, 61 insertions(+), 61 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c	2006-10-13 12:07:41.337353137 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c	2006-10-13 16:11:54.646069045 +1000
@@ -55,9 +55,9 @@ STATIC void	xfs_icsb_destroy_counters(xf
 STATIC void	xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, int);
 STATIC void	xfs_icsb_sync_counters(xfs_mount_t *);
 STATIC int	xfs_icsb_modify_counters(xfs_mount_t *, xfs_sb_field_t,
-						int, int);
+						int64_t, int);
 STATIC int	xfs_icsb_modify_counters_locked(xfs_mount_t *, xfs_sb_field_t,
-						int, int);
+						int64_t, int);
 STATIC int	xfs_icsb_disable_counter(xfs_mount_t *, xfs_sb_field_t);
 
 #else
@@ -1251,7 +1251,7 @@ xfs_mod_sb(xfs_trans_t *tp, __int64_t fi
  */
 int
 xfs_mod_incore_sb_unlocked(xfs_mount_t *mp, xfs_sb_field_t field,
-			int delta, int rsvd)
+			int64_t delta, int rsvd)
 {
 	int		scounter;	/* short counter for 32 bit fields */
 	long long	lcounter;	/* long counter for 64 bit fields */
@@ -1283,7 +1283,6 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t *
 		mp->m_sb.sb_ifree = lcounter;
 		return 0;
 	case XFS_SBS_FDBLOCKS:
-
 		lcounter = (long long)
 			mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp);
 		res_used = (long long)(mp->m_resblks - mp->m_resblks_avail);
@@ -1414,7 +1413,7 @@ xfs_mod_incore_sb_unlocked(xfs_mount_t *
  * routine to do the work.
  */
 int
-xfs_mod_incore_sb(xfs_mount_t *mp, xfs_sb_field_t field, int delta, int rsvd)
+xfs_mod_incore_sb(xfs_mount_t *mp, xfs_sb_field_t field, int64_t delta, int rsvd)
 {
 	unsigned long	s;
 	int	status;
@@ -2052,7 +2051,7 @@ STATIC int
 xfs_icsb_modify_counters_int(
 	xfs_mount_t	*mp,
 	xfs_sb_field_t	field,
-	int		delta,
+	int64_t		delta,
 	int		rsvd,
 	int		flags)
 {
@@ -2149,7 +2148,7 @@ STATIC int
 xfs_icsb_modify_counters(
 	xfs_mount_t	*mp,
 	xfs_sb_field_t	field,
-	int		delta,
+	int64_t		delta,
 	int		rsvd)
 {
 	return xfs_icsb_modify_counters_int(mp, field, delta, rsvd, 0);
@@ -2162,7 +2161,7 @@ STATIC int
 xfs_icsb_modify_counters_locked(
 	xfs_mount_t	*mp,
 	xfs_sb_field_t	field,
-	int		delta,
+	int64_t		delta,
 	int		rsvd)
 {
 	return xfs_icsb_modify_counters_int(mp, field, delta,
Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h	2006-10-13 12:07:41.405344379 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h	2006-10-13 12:13:43.478699896 +1000
@@ -576,10 +576,11 @@ xfs_daddr_to_agbno(struct xfs_mount *mp,
 
 /*
  * This structure is for use by the xfs_mod_incore_sb_batch() routine.
+ * xfs_growfs can specify a few fields which are more than int limit
  */
 typedef struct xfs_mod_sb {
 	xfs_sb_field_t	msb_field;	/* Field to modify, see below */
-	int		msb_delta;	/* Change to make to specified field */
+	int64_t		msb_delta;	/* Change to make to specified field */
 } xfs_mod_sb_t;
 
 #define	XFS_MOUNT_ILOCK(mp)	mutex_lock(&((mp)->m_ilock))
@@ -597,9 +598,9 @@ extern int	xfs_unmountfs(xfs_mount_t *, 
 extern void	xfs_unmountfs_close(xfs_mount_t *, struct cred *);
 extern int	xfs_unmountfs_writesb(xfs_mount_t *);
 extern int	xfs_unmount_flush(xfs_mount_t *, int);
-extern int	xfs_mod_incore_sb(xfs_mount_t *, xfs_sb_field_t, int, int);
+extern int	xfs_mod_incore_sb(xfs_mount_t *, xfs_sb_field_t, int64_t, int);
 extern int	xfs_mod_incore_sb_unlocked(xfs_mount_t *, xfs_sb_field_t,
-			int, int);
+			int64_t, int);
 extern int	xfs_mod_incore_sb_batch(xfs_mount_t *, xfs_mod_sb_t *,
 			uint, int);
 extern struct xfs_buf *xfs_getsb(xfs_mount_t *, int);
Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.c	2006-10-13 12:07:41.413343349 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_trans.c	2006-10-13 12:13:43.478699896 +1000
@@ -339,7 +339,7 @@ xfs_trans_reserve(
 	 */
 	if (blocks > 0) {
 		error = xfs_mod_incore_sb(tp->t_mountp, XFS_SBS_FDBLOCKS,
-					  -blocks, rsvd);
+					  -((int64_t)blocks), rsvd);
 		if (error != 0) {
 			current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
 			return (XFS_ERROR(ENOSPC));
@@ -380,7 +380,7 @@ xfs_trans_reserve(
 	 */
 	if (rtextents > 0) {
 		error = xfs_mod_incore_sb(tp->t_mountp, XFS_SBS_FREXTENTS,
-					  -rtextents, rsvd);
+					  -((int64_t)rtextents), rsvd);
 		if (error) {
 			error = XFS_ERROR(ENOSPC);
 			goto undo_log;
@@ -410,7 +410,7 @@ undo_log:
 undo_blocks:
 	if (blocks > 0) {
 		(void) xfs_mod_incore_sb(tp->t_mountp, XFS_SBS_FDBLOCKS,
-					 blocks, rsvd);
+					 (int64_t)blocks, rsvd);
 		tp->t_blk_res = 0;
 	}
 
@@ -432,7 +432,7 @@ void
 xfs_trans_mod_sb(
 	xfs_trans_t	*tp,
 	uint		field,
-	long		delta)
+	int64_t		delta)
 {
 
 	switch (field) {
@@ -663,62 +663,62 @@ xfs_trans_unreserve_and_mod_sb(
 	if (tp->t_flags & XFS_TRANS_SB_DIRTY) {
 		if (tp->t_icount_delta != 0) {
 			msbp->msb_field = XFS_SBS_ICOUNT;
-			msbp->msb_delta = (int)tp->t_icount_delta;
+			msbp->msb_delta = tp->t_icount_delta;
 			msbp++;
 		}
 		if (tp->t_ifree_delta != 0) {
 			msbp->msb_field = XFS_SBS_IFREE;
-			msbp->msb_delta = (int)tp->t_ifree_delta;
+			msbp->msb_delta = tp->t_ifree_delta;
 			msbp++;
 		}
 		if (tp->t_fdblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_FDBLOCKS;
-			msbp->msb_delta = (int)tp->t_fdblocks_delta;
+			msbp->msb_delta = tp->t_fdblocks_delta;
 			msbp++;
 		}
 		if (tp->t_frextents_delta != 0) {
 			msbp->msb_field = XFS_SBS_FREXTENTS;
-			msbp->msb_delta = (int)tp->t_frextents_delta;
+			msbp->msb_delta = tp->t_frextents_delta;
 			msbp++;
 		}
 		if (tp->t_dblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_DBLOCKS;
-			msbp->msb_delta = (int)tp->t_dblocks_delta;
+			msbp->msb_delta = tp->t_dblocks_delta;
 			msbp++;
 		}
 		if (tp->t_agcount_delta != 0) {
 			msbp->msb_field = XFS_SBS_AGCOUNT;
-			msbp->msb_delta = (int)tp->t_agcount_delta;
+			msbp->msb_delta = tp->t_agcount_delta;
 			msbp++;
 		}
 		if (tp->t_imaxpct_delta != 0) {
 			msbp->msb_field = XFS_SBS_IMAX_PCT;
-			msbp->msb_delta = (int)tp->t_imaxpct_delta;
+			msbp->msb_delta = tp->t_imaxpct_delta;
 			msbp++;
 		}
 		if (tp->t_rextsize_delta != 0) {
 			msbp->msb_field = XFS_SBS_REXTSIZE;
-			msbp->msb_delta = (int)tp->t_rextsize_delta;
+			msbp->msb_delta = tp->t_rextsize_delta;
 			msbp++;
 		}
 		if (tp->t_rbmblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_RBMBLOCKS;
-			msbp->msb_delta = (int)tp->t_rbmblocks_delta;
+			msbp->msb_delta = tp->t_rbmblocks_delta;
 			msbp++;
 		}
 		if (tp->t_rblocks_delta != 0) {
 			msbp->msb_field = XFS_SBS_RBLOCKS;
-			msbp->msb_delta = (int)tp->t_rblocks_delta;
+			msbp->msb_delta = tp->t_rblocks_delta;
 			msbp++;
 		}
 		if (tp->t_rextents_delta != 0) {
 			msbp->msb_field = XFS_SBS_REXTENTS;
-			msbp->msb_delta = (int)tp->t_rextents_delta;
+			msbp->msb_delta = tp->t_rextents_delta;
 			msbp++;
 		}
 		if (tp->t_rextslog_delta != 0) {
 			msbp->msb_field = XFS_SBS_REXTSLOG;
-			msbp->msb_delta = (int)tp->t_rextslog_delta;
+			msbp->msb_delta = tp->t_rextslog_delta;
 			msbp++;
 		}
 	}
Index: 2.6.x-xfs-new/fs/xfs/xfs_trans.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans.h	2006-10-13 12:07:41.413343349 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_trans.h	2006-10-13 12:13:43.478699896 +1000
@@ -350,25 +350,25 @@ typedef struct xfs_trans {
 	xfs_trans_callback_t	t_callback;	/* transaction callback */
 	void			*t_callarg;	/* callback arg */
 	unsigned int		t_flags;	/* misc flags */
-	long			t_icount_delta;	/* superblock icount change */
-	long			t_ifree_delta;	/* superblock ifree change */
-	long			t_fdblocks_delta; /* superblock fdblocks chg */
-	long			t_res_fdblocks_delta; /* on-disk only chg */
-	long			t_frextents_delta;/* superblock freextents chg*/
-	long			t_res_frextents_delta; /* on-disk only chg */
+	int64_t			t_icount_delta;	/* superblock icount change */
+	int64_t			t_ifree_delta;	/* superblock ifree change */
+	int64_t			t_fdblocks_delta; /* superblock fdblocks chg */
+	int64_t			t_res_fdblocks_delta; /* on-disk only chg */
+	int64_t			t_frextents_delta;/* superblock freextents chg*/
+	int64_t			t_res_frextents_delta; /* on-disk only chg */
 #ifdef DEBUG
-	long			t_ag_freeblks_delta; /* debugging counter */
-	long			t_ag_flist_delta; /* debugging counter */
-	long			t_ag_btree_delta; /* debugging counter */
+	int64_t			t_ag_freeblks_delta; /* debugging counter */
+	int64_t			t_ag_flist_delta; /* debugging counter */
+	int64_t			t_ag_btree_delta; /* debugging counter */
 #endif
-	long			t_dblocks_delta;/* superblock dblocks change */
-	long			t_agcount_delta;/* superblock agcount change */
-	long			t_imaxpct_delta;/* superblock imaxpct change */
-	long			t_rextsize_delta;/* superblock rextsize chg */
-	long			t_rbmblocks_delta;/* superblock rbmblocks chg */
-	long			t_rblocks_delta;/* superblock rblocks change */
-	long			t_rextents_delta;/* superblocks rextents chg */
-	long			t_rextslog_delta;/* superblocks rextslog chg */
+	int64_t			t_dblocks_delta;/* superblock dblocks change */
+	int64_t			t_agcount_delta;/* superblock agcount change */
+	int64_t			t_imaxpct_delta;/* superblock imaxpct change */
+	int64_t			t_rextsize_delta;/* superblock rextsize chg */
+	int64_t			t_rbmblocks_delta;/* superblock rbmblocks chg */
+	int64_t			t_rblocks_delta;/* superblock rblocks change */
+	int64_t			t_rextents_delta;/* superblocks rextents chg */
+	int64_t			t_rextslog_delta;/* superblocks rextslog chg */
 	unsigned int		t_items_free;	/* log item descs free */
 	xfs_log_item_chunk_t	t_items;	/* first log item desc chunk */
 	xfs_trans_header_t	t_header;	/* header for in-log trans */
@@ -932,9 +932,9 @@ typedef struct xfs_trans {
 #define	xfs_trans_set_sync(tp)		((tp)->t_flags |= XFS_TRANS_SYNC)
 
 #ifdef DEBUG
-#define	xfs_trans_agblocks_delta(tp, d)	((tp)->t_ag_freeblks_delta += (long)d)
-#define	xfs_trans_agflist_delta(tp, d)	((tp)->t_ag_flist_delta += (long)d)
-#define	xfs_trans_agbtree_delta(tp, d)	((tp)->t_ag_btree_delta += (long)d)
+#define	xfs_trans_agblocks_delta(tp, d)	((tp)->t_ag_freeblks_delta += (int64_t)d)
+#define	xfs_trans_agflist_delta(tp, d)	((tp)->t_ag_flist_delta += (int64_t)d)
+#define	xfs_trans_agbtree_delta(tp, d)	((tp)->t_ag_btree_delta += (int64_t)d)
 #else
 #define	xfs_trans_agblocks_delta(tp, d)
 #define	xfs_trans_agflist_delta(tp, d)
@@ -950,7 +950,7 @@ xfs_trans_t	*_xfs_trans_alloc(struct xfs
 xfs_trans_t	*xfs_trans_dup(xfs_trans_t *);
 int		xfs_trans_reserve(xfs_trans_t *, uint, uint, uint,
 				  uint, uint);
-void		xfs_trans_mod_sb(xfs_trans_t *, uint, long);
+void		xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t);
 struct xfs_buf	*xfs_trans_get_buf(xfs_trans_t *, struct xfs_buftarg *, xfs_daddr_t,
 				   int, uint);
 int		xfs_trans_read_buf(struct xfs_mount *, xfs_trans_t *,
Index: 2.6.x-xfs-new/fs/xfs/xfs_bmap.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_bmap.c	2006-10-13 11:47:18.195899267 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_bmap.c	2006-10-13 12:13:43.498697319 +1000
@@ -684,7 +684,7 @@ xfs_bmap_add_extent(
 		ASSERT(nblks <= da_old);
 		if (nblks < da_old)
 			xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS,
-				(int)(da_old - nblks), rsvd);
+				(int64_t)(da_old - nblks), rsvd);
 	}
 	/*
 	 * Clear out the allocated field, done with it now in any case.
@@ -1207,7 +1207,7 @@ xfs_bmap_add_extent_delay_real(
 		diff = (int)(temp + temp2 - STARTBLOCKVAL(PREV.br_startblock) -
 			(cur ? cur->bc_private.b.allocated : 0));
 		if (diff > 0 &&
-		    xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS, -diff, rsvd)) {
+		    xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS, -((int64_t)diff), rsvd)) {
 			/*
 			 * Ick gross gag me with a spoon.
 			 */
@@ -1218,7 +1218,7 @@ xfs_bmap_add_extent_delay_real(
 					diff--;
 					if (!diff ||
 					    !xfs_mod_incore_sb(ip->i_mount,
-						    XFS_SBS_FDBLOCKS, -diff, rsvd))
+						    XFS_SBS_FDBLOCKS, -((int64_t)diff), rsvd))
 						break;
 				}
 				if (temp2) {
@@ -1226,7 +1226,7 @@ xfs_bmap_add_extent_delay_real(
 					diff--;
 					if (!diff ||
 					    !xfs_mod_incore_sb(ip->i_mount,
-						    XFS_SBS_FDBLOCKS, -diff, rsvd))
+						    XFS_SBS_FDBLOCKS, -((int64_t)diff), rsvd))
 						break;
 				}
 			}
@@ -2013,7 +2013,7 @@ xfs_bmap_add_extent_hole_delay(
 	if (oldlen != newlen) {
 		ASSERT(oldlen > newlen);
 		xfs_mod_incore_sb(ip->i_mount, XFS_SBS_FDBLOCKS,
-			(int)(oldlen - newlen), rsvd);
+			(int64_t)(oldlen - newlen), rsvd);
 		/*
 		 * Nothing to do for disk quota accounting here.
 		 */
@@ -3357,7 +3357,7 @@ xfs_bmap_del_extent(
 	 */
 	ASSERT(da_old >= da_new);
 	if (da_old > da_new)
-		xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, (int)(da_old - da_new),
+		xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS, (int64_t)(da_old - da_new),
 			rsvd);
 	if (delta) {
 		/* DELTA: report the original extent. */
@@ -4927,28 +4927,28 @@ xfs_bmapi(
 				if (rt) {
 					error = xfs_mod_incore_sb(mp,
 							XFS_SBS_FREXTENTS,
-							-(extsz), (flags &
+							-((int64_t)extsz), (flags &
 							XFS_BMAPI_RSVBLOCKS));
 				} else {
 					error = xfs_mod_incore_sb(mp,
 							XFS_SBS_FDBLOCKS,
-							-(alen), (flags &
+							-((int64_t)alen), (flags &
 							XFS_BMAPI_RSVBLOCKS));
 				}
 				if (!error) {
 					error = xfs_mod_incore_sb(mp,
 							XFS_SBS_FDBLOCKS,
-							-(indlen), (flags &
+							-((int64_t)indlen), (flags &
 							XFS_BMAPI_RSVBLOCKS));
 					if (error && rt)
 						xfs_mod_incore_sb(mp,
 							XFS_SBS_FREXTENTS,
-							extsz, (flags &
+							(int64_t)extsz, (flags &
 							XFS_BMAPI_RSVBLOCKS));
 					else if (error)
 						xfs_mod_incore_sb(mp,
 							XFS_SBS_FDBLOCKS,
-							alen, (flags &
+							(int64_t)alen, (flags &
 							XFS_BMAPI_RSVBLOCKS));
 				}
 
@@ -5614,13 +5614,13 @@ xfs_bunmapi(
 				rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
 				do_div(rtexts, mp->m_sb.sb_rextsize);
 				xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS,
-						(int)rtexts, rsvd);
+						(int64_t)rtexts, rsvd);
 				(void)XFS_TRANS_RESERVE_QUOTA_NBLKS(mp,
 					NULL, ip, -((long)del.br_blockcount), 0,
 					XFS_QMOPT_RES_RTBLKS);
 			} else {
 				xfs_mod_incore_sb(mp, XFS_SBS_FDBLOCKS,
-						(int)del.br_blockcount, rsvd);
+						(int64_t)del.br_blockcount, rsvd);
 				(void)XFS_TRANS_RESERVE_QUOTA_NBLKS(mp,
 					NULL, ip, -((long)del.br_blockcount), 0,
 					XFS_QMOPT_RES_REGBLKS);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data type overflow in xfs_trans_unreserve_and_mod_sb
  2006-10-13  6:13     ` David Chinner
@ 2006-10-13  8:31       ` Shailendra Tripathi
  0 siblings, 0 replies; 10+ messages in thread
From: Shailendra Tripathi @ 2006-10-13  8:31 UTC (permalink / raw)
  To: David Chinner; +Cc: sandeen, xfs, Timothy Shimmin



David Chinner wrote:
>>Eric, you suggested specific 64 bit types - I think that's really the
>>way to fix this, but it's a much bigger change...
> 
> 
> Shailendra, here's a patch that passes XFSQA that changes this all to 64 bit
> types.  I've had to fix various type abuses that weren't obvious because gcc
> fails to warn when you pass a uint into a function parameter that is declared
> as int64_t.....
> 
> I haven't tested the >2TB grow case yet, but it should work now 
> on both 32bit and 64 bit platforms with this patch.
> 
> Is there anything I missed here in the conversion?
> 
> Cheers,
> 
> Dave.

Looks ok to me, Dave.
Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-10-13  8:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-23 13:27 largeio mount and performance impact Sebastian Brings
2006-09-23 16:34 ` Eric Sandeen
2006-09-24 13:41   ` Sebastian Brings
2006-09-24 13:47     ` Sebastian Brings
2006-09-25  8:38 ` Data type overflow in xfs_trans_unreserve_and_mod_sb Shailendra Tripathi
2006-09-25 14:32   ` Eric Sandeen
2006-09-25 14:52     ` Shailendra Tripathi
2006-10-11  5:25   ` David Chinner
2006-10-13  6:13     ` David Chinner
2006-10-13  8:31       ` Shailendra Tripathi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox