public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* review: allocate bmapi args
@ 2007-04-19  7:25 David Chinner
  2007-04-19  7:51 ` Nathan Scott
  0 siblings, 1 reply; 5+ messages in thread
From: David Chinner @ 2007-04-19  7:25 UTC (permalink / raw)
  To: xfs-dev; +Cc: xfs-oss


Save some stack space (64 bytes on 32bit systems, 80 bytes on 64bit
systems) in a critical path by allocating the xfs_bmalloca_t
structure rather than putting it on the stack.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/xfs_bmap.c |   62 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 33 insertions(+), 29 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_bmap.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_bmap.c	2007-04-19 13:26:49.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_bmap.c	2007-04-19 13:47:03.161553644 +1000
@@ -4710,7 +4710,7 @@ xfs_bmapi(
 	xfs_fsblock_t	abno;		/* allocated block number */
 	xfs_extlen_t	alen;		/* allocated extent length */
 	xfs_fileoff_t	aoff;		/* allocated file offset */
-	xfs_bmalloca_t	bma;		/* args for xfs_bmap_alloc */
+	xfs_bmalloca_t	*bma;		/* args for xfs_bmap_alloc */
 	xfs_btree_cur_t	*cur;		/* bmap btree cursor */
 	xfs_fileoff_t	end;		/* end of mapped file region */
 	int		eof;		/* we've hit the end of extents */
@@ -4763,6 +4763,9 @@ xfs_bmapi(
 	}
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return XFS_ERROR(EIO);
+	bma = kmem_zalloc(sizeof(xfs_bmalloca_t), KM_SLEEP);
+	if (!bma)
+		return XFS_ERROR(ENOMEM);
 	rt = (whichfork == XFS_DATA_FORK) && XFS_IS_REALTIME_INODE(ip);
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT(ifp->if_ext_max ==
@@ -4816,7 +4819,7 @@ xfs_bmapi(
 	n = 0;
 	end = bno + len;
 	obno = bno;
-	bma.ip = NULL;
+	bma->ip = NULL;
 	if (delta) {
 		delta->xed_startoff = NULLFILEOFF;
 		delta->xed_blockcount = 0;
@@ -4960,34 +4963,34 @@ xfs_bmapi(
 				 * If first time, allocate and fill in
 				 * once-only bma fields.
 				 */
-				if (bma.ip == NULL) {
-					bma.tp = tp;
-					bma.ip = ip;
-					bma.prevp = &prev;
-					bma.gotp = &got;
-					bma.total = total;
-					bma.userdata = 0;
+				if (bma->ip == NULL) {
+					bma->tp = tp;
+					bma->ip = ip;
+					bma->prevp = &prev;
+					bma->gotp = &got;
+					bma->total = total;
+					bma->userdata = 0;
 				}
 				/* Indicate if this is the first user data
 				 * in the file, or just any user data.
 				 */
 				if (!(flags & XFS_BMAPI_METADATA)) {
-					bma.userdata = (aoff == 0) ?
+					bma->userdata = (aoff == 0) ?
 						XFS_ALLOC_INITIAL_USER_DATA :
 						XFS_ALLOC_USERDATA;
 				}
 				/*
 				 * Fill in changeable bma fields.
 				 */
-				bma.eof = eof;
-				bma.firstblock = *firstblock;
-				bma.alen = alen;
-				bma.off = aoff;
-				bma.conv = !!(flags & XFS_BMAPI_CONVERT);
-				bma.wasdel = wasdelay;
-				bma.minlen = minlen;
-				bma.low = flist->xbf_low;
-				bma.minleft = minleft;
+				bma->eof = eof;
+				bma->firstblock = *firstblock;
+				bma->alen = alen;
+				bma->off = aoff;
+				bma->conv = !!(flags & XFS_BMAPI_CONVERT);
+				bma->wasdel = wasdelay;
+				bma->minlen = minlen;
+				bma->low = flist->xbf_low;
+				bma->minleft = minleft;
 				/*
 				 * Only want to do the alignment at the
 				 * eof if it is userdata and allocation length
@@ -4997,30 +5000,30 @@ xfs_bmapi(
 				    (!(flags & XFS_BMAPI_METADATA)) &&
 				    (whichfork == XFS_DATA_FORK)) {
 					if ((error = xfs_bmap_isaeof(ip, aoff,
-							whichfork, &bma.aeof)))
+							whichfork, &bma->aeof)))
 						goto error0;
 				} else
-					bma.aeof = 0;
+					bma->aeof = 0;
 				/*
 				 * Call allocator.
 				 */
-				if ((error = xfs_bmap_alloc(&bma)))
+				if ((error = xfs_bmap_alloc(bma)))
 					goto error0;
 				/*
 				 * Copy out result fields.
 				 */
-				abno = bma.rval;
-				if ((flist->xbf_low = bma.low))
+				abno = bma->rval;
+				if ((flist->xbf_low = bma->low))
 					minleft = 0;
-				alen = bma.alen;
-				aoff = bma.off;
+				alen = bma->alen;
+				aoff = bma->off;
 				ASSERT(*firstblock == NULLFSBLOCK ||
 				       XFS_FSB_TO_AGNO(mp, *firstblock) ==
-				       XFS_FSB_TO_AGNO(mp, bma.firstblock) ||
+				       XFS_FSB_TO_AGNO(mp, bma->firstblock) ||
 				       (flist->xbf_low &&
 					XFS_FSB_TO_AGNO(mp, *firstblock) <
-					XFS_FSB_TO_AGNO(mp, bma.firstblock)));
-				*firstblock = bma.firstblock;
+					XFS_FSB_TO_AGNO(mp, bma->firstblock)));
+				*firstblock = bma->firstblock;
 				if (cur)
 					cur->bc_private.b.firstblock =
 						*firstblock;
@@ -5290,6 +5293,7 @@ error0:
 	if (!error)
 		xfs_bmap_validate_ret(orig_bno, orig_len, orig_flags, orig_mval,
 			orig_nmap, *nmap);
+	kmem_free(bma, sizeof(xfs_bmalloca_t));
 	return error;
 }
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: review: allocate bmapi args
  2007-04-19  7:25 review: allocate bmapi args David Chinner
@ 2007-04-19  7:51 ` Nathan Scott
  2007-04-19  8:23   ` David Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Nathan Scott @ 2007-04-19  7:51 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs-dev, xfs-oss

On Thu, 2007-04-19 at 17:25 +1000, David Chinner wrote:
> 
> +       bma = kmem_zalloc(sizeof(xfs_bmalloca_t), KM_SLEEP);
> +       if (!bma)
> +               return XFS_ERROR(ENOMEM); 

I guess you meant KM_NOSLEEP?  Are you sure this is legit though?
(are all callers going to be able to handle this?)  I'm thinking
of the writeout paths where we're doing space allocation (unwritten
extent conversion comes through here too) in order to free up some
page cache so other memory allocs elsewhere can proceed.  I don't
see any other memory allocations in this area of the code, so I
guess I'd be treading really carefully here..

(Oh, and why the _zalloc?  Could just do an _alloc, since previous
code was using non-zeroed memory - so, should have been filling in
all fields).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: review: allocate bmapi args
  2007-04-19  7:51 ` Nathan Scott
@ 2007-04-19  8:23   ` David Chinner
  2007-04-20  4:41     ` Nathan Scott
  0 siblings, 1 reply; 5+ messages in thread
From: David Chinner @ 2007-04-19  8:23 UTC (permalink / raw)
  To: Nathan Scott; +Cc: David Chinner, xfs-dev, xfs-oss

On Thu, Apr 19, 2007 at 05:51:02PM +1000, Nathan Scott wrote:
> On Thu, 2007-04-19 at 17:25 +1000, David Chinner wrote:
> > 
> > +       bma = kmem_zalloc(sizeof(xfs_bmalloca_t), KM_SLEEP);
> > +       if (!bma)
> > +               return XFS_ERROR(ENOMEM); 
> 
> I guess you meant KM_NOSLEEP? 

No, I meant a sleeping allocation. I guess that mea I don't
need the error handling....

> Are you sure this is legit though?

It *must* be. We already rely on being able to do substantial
amounts of allocation in this path....

> (are all callers going to be able to handle this?)  I'm thinking
> of the writeout paths where we're doing space allocation (unwritten
> extent conversion comes through here too) in order to free up some
> page cache so other memory allocs elsewhere can proceed.  I don't
> see any other memory allocations in this area of the code, so I
> guess I'd be treading really carefully here..

We modify the incore extent list as it grows and shrinks in this
path. It is critical that we are able to allocate at least small
amounts of memory in these writeback paths, and we currently do that
with kmem_alloc(KM_SLEEP). A quick search of the xfs_iext_*
functions shows lots of allocations are done in manipulating
the incore extents....

Then there's needing new pages in the page cache and xfs_buf_t's
if we trigger a btree split duringthe allocation, and so on.

IOWS, there's plenty of far larger allocations down through this
path already, and if any one of them doesn't succeed we are
pretty much fscked. given that we haven't had any reports of
writeback deadlocks since the new incore extent handling went
in, I think small allocations like this are not a problem.

FWIW, I have done low memory testing and I wasn't about to trigger
any problems.....

> (Oh, and why the _zalloc?  Could just do an _alloc, since previous
> code was using non-zeroed memory - so, should have been filling in
> all fields).

Habit. And it doesn't hurt performance at all - we've got to take
that cache miss somewhere along the line....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: review: allocate bmapi args
  2007-04-19  8:23   ` David Chinner
@ 2007-04-20  4:41     ` Nathan Scott
  2007-04-20  5:34       ` David Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Nathan Scott @ 2007-04-20  4:41 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs-dev, xfs-oss

On Thu, 2007-04-19 at 18:23 +1000, David Chinner wrote:
> ...
> > Are you sure this is legit though?
> 
> It *must* be. We already rely on being able to do substantial
> amounts of allocation in this path....

Not necessarily, we only sometimes (for some values of BMAPI flags
I mean) do memory allocations.

> ...
> We modify the incore extent list as it grows and shrinks in this
> path. It is critical that we are able to allocate at least small 

Well, not always.  For cases where we modify the extents we must
call in here with Steve's funky I'm-in-a-transaction process-flag
set, which is the secret handshake for the memory allocator to not
use GFP_FS.  For cases where we are only reading the extent list,
we would not be doing allocations before and we'd not be protected
by that extra magic.  So now those paths can start getting ENOMEM
in places where that wouldn't have happened before.. I guess that
filesystem shutdowns could result, possibly.

> FWIW, I have done low memory testing and I wasn't about to trigger
> any problems.....

It would be a once-in-a-blue-moon type problem though, unfortunately;
one of the really-really-hard to trigger types of problem.

> > (Oh, and why the _zalloc?  Could just do an _alloc, since previous
> > code was using non-zeroed memory - so, should have been filling in
> > all fields).
> 
> Habit. And it doesn't hurt performance at all - we've got to take

Hrmmm... is there any point in having a non-zeroing interface at
all then?  I thought the non-zeroing version was about all using
the fact that you know you're going to overwrite all the fields
anyway shortly, so theres no point zeroing initially...

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: review: allocate bmapi args
  2007-04-20  4:41     ` Nathan Scott
@ 2007-04-20  5:34       ` David Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: David Chinner @ 2007-04-20  5:34 UTC (permalink / raw)
  To: Nathan Scott; +Cc: David Chinner, xfs-dev, xfs-oss

On Fri, Apr 20, 2007 at 02:41:57PM +1000, Nathan Scott wrote:
> On Thu, 2007-04-19 at 18:23 +1000, David Chinner wrote:
> > ...
> > > Are you sure this is legit though?
> > 
> > It *must* be. We already rely on being able to do substantial
> > amounts of allocation in this path....
> 
> Not necessarily, we only sometimes (for some values of BMAPI flags
> I mean) do memory allocations.
> 
> > ...
> > We modify the incore extent list as it grows and shrinks in this
> > path. It is critical that we are able to allocate at least small 
> 
> Well, not always.  For cases where we modify the extents we must
> call in here with Steve's funky I'm-in-a-transaction process-flag
> set, which is the secret handshake for the memory allocator to not
> use GFP_FS.

*nod*

> For cases where we are only reading the extent list,
> we would not be doing allocations before and we'd not be protected
> by that extra magic.  So now those paths can start getting ENOMEM
> in places where that wouldn't have happened before.. I guess that
> filesystem shutdowns could result, possibly.

Well, with a sleeping allocation we'll get hangs, not shutdowns.
Quite frankly, hangs are far easier to debug than shutdowns. If it
is really does become an issue, we could use mempools here - we
can guarantee that we will return the object to the pool if we
don't hang on some other allocation and hence we'd always be
able to make progress.

However, I'm not sure I want to go that far without having actually
seen a normal allocation cause a problem here and right now I think
that saving ~250 bytes of stack (~10-15% of XFS's stack usage on
ia32!) through the paths we know blow is substantial.

> > FWIW, I have done low memory testing and I wasn't about to trigger
> > any problems.....
> 
> It would be a once-in-a-blue-moon type problem though, unfortunately;
> one of the really-really-hard to trigger types of problem.

Yes, that it is. But a sysrq-t will point out the problem
immediately as we'll see processes hung trying to do memory
allocation.

> > > (Oh, and why the _zalloc?  Could just do an _alloc, since previous
> > > code was using non-zeroed memory - so, should have been filling in
> > > all fields).
> > 
> > Habit. And it doesn't hurt performance at all - we've got to take
> 
> Hrmmm... is there any point in having a non-zeroing interface at
> all then?

Sorry - i should have said that "for small allocations like this it
doesn't hurt performance" - the cycles consumed by the allocation
and lost in the initial cacheline fetch are far, far greater than
those spent in zeroing part of a cacheline once it's accessable.

> I thought the non-zeroing version was about all using
> the fact that you know you're going to overwrite all the fields
> anyway shortly, so theres no point zeroing initially...

Zeroing causes sequential access to cachelines and hence hardware
prefetchers can operate quickly and reduce the number of CPU stalls
compared to filling out the structure in random order. And some CPUs
can zero-fill cachelines without having fetched them from memory
(PPC can do this IIRC) so you don't even stall the CPU....

But there's still plenty of cases where you don't want to touch
some or all of the allocated region (e.g. you're about to memcpy()
something into it) so we still need the non-zeroing version of
the allocator...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-04-20  5:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-19  7:25 review: allocate bmapi args David Chinner
2007-04-19  7:51 ` Nathan Scott
2007-04-19  8:23   ` David Chinner
2007-04-20  4:41     ` Nathan Scott
2007-04-20  5:34       ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox