cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir()
@ 2010-07-28 11:15 Steven Whitehouse
  2010-07-28 15:39 ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Whitehouse @ 2010-07-28 11:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com


If we don't need a huge amount of memory in ->readdir() then
we can use kmalloc rather than vmalloc to allocate it. This
should cut down on the greater overheads associated with
vmalloc for smaller directories.

We may be able to eliminate vmalloc entirely at some stage,
but this is easy to do right away.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 401deaf..80d9dfb 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1248,12 +1248,14 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 	struct gfs2_leaf *lf;
 	unsigned entries = 0, entries2 = 0;
 	unsigned leaves = 0;
+	unsigned alloc_size;
 	const struct gfs2_dirent **darr, *dent;
 	struct dirent_gather g;
-	struct buffer_head **larr;
+	struct buffer_head **larr = NULL;
 	int leaf = 0;
 	int error, i;
 	u64 lfn = leaf_no;
+	int do_vfree = 0;
 
 	do {
 		error = get_leaf(ip, lfn, &bh);
@@ -1278,9 +1280,15 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 	 * 99 is the maximum number of entries that can fit in a single
 	 * leaf block.
 	 */
-	larr = vmalloc((leaves + entries + 99) * sizeof(void *));
-	if (!larr)
-		goto out;
+	alloc_size = (leaves + entries + 99) * sizeof(void *);
+	if (alloc_size < KMALLOC_MAX_SIZE)
+		larr = kmalloc(alloc_size, GFP_NOFS);
+	if (!larr) {
+		larr = vmalloc(alloc_size);
+		if (!larr)
+			goto out;
+		do_vfree = 1;
+	}
 	darr = (const struct gfs2_dirent **)(larr + leaves);
 	g.pdent = darr;
 	g.offset = 0;
@@ -1289,7 +1297,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 	do {
 		error = get_leaf(ip, lfn, &bh);
 		if (error)
-			goto out_kfree;
+			goto out_free;
 		lf = (struct gfs2_leaf *)bh->b_data;
 		lfn = be64_to_cpu(lf->lf_next);
 		if (lf->lf_entries) {
@@ -1298,7 +1306,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 						gfs2_dirent_gather, NULL, &g);
 			error = PTR_ERR(dent);
 			if (IS_ERR(dent))
-				goto out_kfree;
+				goto out_free;
 			if (entries2 != g.offset) {
 				fs_warn(sdp, "Number of entries corrupt in dir "
 						"leaf %llu, entries2 (%u) != "
@@ -1307,7 +1315,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 					entries2, g.offset);
 					
 				error = -EIO;
-				goto out_kfree;
+				goto out_free;
 			}
 			error = 0;
 			larr[leaf++] = bh;
@@ -1319,10 +1327,13 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 	BUG_ON(entries2 != entries);
 	error = do_filldir_main(ip, offset, opaque, filldir, darr,
 				entries, copied);
-out_kfree:
+out_free:
 	for(i = 0; i < leaf; i++)
 		brelse(larr[i]);
-	vfree(larr);
+	if (do_vfree)
+		vfree(larr);
+	else
+		kfree(larr);
 out:
 	return error;
 }




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir()
  2010-07-28 11:15 [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() Steven Whitehouse
@ 2010-07-28 15:39 ` Linus Torvalds
  2010-07-28 16:52   ` Steven Whitehouse
  2010-07-28 16:56   ` [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2) Steven Whitehouse
  0 siblings, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2010-07-28 15:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Jul 28, 2010 at 4:15 AM, Steven Whitehouse <swhiteho@redhat.com> wrote:
>
> We may be able to eliminate vmalloc entirely at some stage,
> but this is easy to do right away.

Quite frankly, I'd much rather see this abstracted out a bit. Why not just do a

  void *memalloc(unsigned int size)
  {
      if (size < KMALLOC_MAX_SIZE) {
         void *ptr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
         if (ptr)
            return ptr;
      }
      return vmalloc(size);
   }

   void memfree(void *ptr)
   {
      unsigned long addr = (unsigned long) ptr;

      if (is_vmalloc_addr(addr)) {
         vfree(ptr);
         return;
      }

      kfree(ptr);
   }

wouldn't that be much nicer? No need for that explicit flag, and you
don't mess up an already way-too-ugly function even more.

Also, I do notice that you used GFP_NOFS, but you didn't use that for
the vmalloc() thing. If there really are lock reentrancy reasons, you
_could_ use __vmalloc(size, GFP_NOFS, PAGE_KERNEL).  But since you've
been using vmalloc() for a long time, I suspect GFP_KERNEL works fine.
Yes/no?

                      Linus



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir()
  2010-07-28 15:39 ` Linus Torvalds
@ 2010-07-28 16:52   ` Steven Whitehouse
  2010-07-28 16:56   ` [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2) Steven Whitehouse
  1 sibling, 0 replies; 7+ messages in thread
From: Steven Whitehouse @ 2010-07-28 16:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, 2010-07-28 at 08:39 -0700, Linus Torvalds wrote:
> On Wed, Jul 28, 2010 at 4:15 AM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> >
> > We may be able to eliminate vmalloc entirely at some stage,
> > but this is easy to do right away.
> 
> Quite frankly, I'd much rather see this abstracted out a bit. Why not just do a
> 
>   void *memalloc(unsigned int size)
>   {
>       if (size < KMALLOC_MAX_SIZE) {
>          void *ptr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
>          if (ptr)
>             return ptr;
>       }
>       return vmalloc(size);
>    }
> 
>    void memfree(void *ptr)
>    {
>       unsigned long addr = (unsigned long) ptr;
> 
>       if (is_vmalloc_addr(addr)) {
>          vfree(ptr);
>          return;
>       }
> 
>       kfree(ptr);
>    }
> 
> wouldn't that be much nicer? No need for that explicit flag, and you
> don't mess up an already way-too-ugly function even more.
> 
> Also, I do notice that you used GFP_NOFS, but you didn't use that for
> the vmalloc() thing. If there really are lock reentrancy reasons, you
> _could_ use __vmalloc(size, GFP_NOFS, PAGE_KERNEL).  But since you've
> been using vmalloc() for a long time, I suspect GFP_KERNEL works fine.
> Yes/no?
> 
>                       Linus

Well it had been working ok, but I don't really trust it since we are
holding a glock on the directory in shared mode at this point. That
means that if we (as a result of dropping dcache) need to unlink an
inode, we might land up taking glocks in the wrong order.

We use GFP_NOFS everywhere else under glocks for that reason, so I think
its safer to use GFP_NOFS here.

An updated patch based on your comments follows in the next email,

Steve.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2)
  2010-07-28 15:39 ` Linus Torvalds
  2010-07-28 16:52   ` Steven Whitehouse
@ 2010-07-28 16:56   ` Steven Whitehouse
  2010-07-28 17:13     ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Steven Whitehouse @ 2010-07-28 16:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com


If we don't need a huge amount of memory in ->readdir() then
we can use kmalloc rather than vmalloc to allocate it. This
should cut down on the greater overheads associated with
vmalloc for smaller directories.

We may be able to eliminate vmalloc entirely at some stage,
but this is easy to do right away.

Also using GFP_NOFS to avoid any issues wrt to deleting inodes
while under a glock, and suggestion from Linus to factor out
the alloc/dealloc.

I've given this a test with a variety of different sized
directories and it seems to work ok.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 401deaf..b9dd88a 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1238,6 +1238,25 @@ static int do_filldir_main(struct gfs2_inode *dip, u64 *offset,
 	return 0;
 }
 
+static void *gfs2_alloc_sort_buffer(unsigned size)
+{
+	void *ptr = NULL;
+
+	if (size < KMALLOC_MAX_SIZE)
+		ptr = kmalloc(size, GFP_NOFS | __GFP_NOWARN);
+	if (!ptr)
+		ptr = __vmalloc(size, GFP_NOFS, PAGE_KERNEL);
+	return ptr;
+}
+
+static void gfs2_free_sort_buffer(void *ptr)
+{
+	if (is_vmalloc_addr(ptr))
+		vfree(ptr);
+	else
+		kfree(ptr);
+}
+
 static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 			      filldir_t filldir, int *copied, unsigned *depth,
 			      u64 leaf_no)
@@ -1278,7 +1297,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 	 * 99 is the maximum number of entries that can fit in a single
 	 * leaf block.
 	 */
-	larr = vmalloc((leaves + entries + 99) * sizeof(void *));
+	larr = gfs2_alloc_sort_buffer((leaves + entries + 99) * sizeof(void *));
 	if (!larr)
 		goto out;
 	darr = (const struct gfs2_dirent **)(larr + leaves);
@@ -1289,7 +1308,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 	do {
 		error = get_leaf(ip, lfn, &bh);
 		if (error)
-			goto out_kfree;
+			goto out_free;
 		lf = (struct gfs2_leaf *)bh->b_data;
 		lfn = be64_to_cpu(lf->lf_next);
 		if (lf->lf_entries) {
@@ -1298,7 +1317,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 						gfs2_dirent_gather, NULL, &g);
 			error = PTR_ERR(dent);
 			if (IS_ERR(dent))
-				goto out_kfree;
+				goto out_free;
 			if (entries2 != g.offset) {
 				fs_warn(sdp, "Number of entries corrupt in dir "
 						"leaf %llu, entries2 (%u) != "
@@ -1307,7 +1326,7 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 					entries2, g.offset);
 					
 				error = -EIO;
-				goto out_kfree;
+				goto out_free;
 			}
 			error = 0;
 			larr[leaf++] = bh;
@@ -1319,10 +1338,10 @@ static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
 	BUG_ON(entries2 != entries);
 	error = do_filldir_main(ip, offset, opaque, filldir, darr,
 				entries, copied);
-out_kfree:
+out_free:
 	for(i = 0; i < leaf; i++)
 		brelse(larr[i]);
-	vfree(larr);
+	gfs2_free_sort_buffer(larr);
 out:
 	return error;
 }




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2)
  2010-07-28 16:56   ` [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2) Steven Whitehouse
@ 2010-07-28 17:13     ` Andrew Morton
  2010-07-28 17:51       ` Steven Whitehouse
  2010-07-29 17:58       ` Joel Becker
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2010-07-28 17:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 28 Jul 2010 17:56:23 +0100 Steven Whitehouse <swhiteho@redhat.com> wrote:

> +static void *gfs2_alloc_sort_buffer(unsigned size)
> +{
> +	void *ptr = NULL;
> +
> +	if (size < KMALLOC_MAX_SIZE)
> +		ptr = kmalloc(size, GFP_NOFS | __GFP_NOWARN);
> +	if (!ptr)
> +		ptr = __vmalloc(size, GFP_NOFS, PAGE_KERNEL);
> +	return ptr;
> +}
> +
> +static void gfs2_free_sort_buffer(void *ptr)
> +{
> +	if (is_vmalloc_addr(ptr))
> +		vfree(ptr);
> +	else
> +		kfree(ptr);
> +}

This got kicked around a bit in May (Subject: mm: generic adaptive
large memory allocation APIs).  That patch tried kmalloc(), then
alloc_pages(), then vmalloc().  Nothing got merged though.

I wasn't terribly excited about it because of vague fears that it would
just incite people to spend even less effort thinking about how large
their individual allocations are.

apparmor has a private kvfree/kvmalloc.  Probably there are other
versions floating around the tree as well.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2)
  2010-07-28 17:13     ` Andrew Morton
@ 2010-07-28 17:51       ` Steven Whitehouse
  2010-07-29 17:58       ` Joel Becker
  1 sibling, 0 replies; 7+ messages in thread
From: Steven Whitehouse @ 2010-07-28 17:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, 2010-07-28 at 10:13 -0700, Andrew Morton wrote:
> On Wed, 28 Jul 2010 17:56:23 +0100 Steven Whitehouse <swhiteho@redhat.com> wrote:
> 
> > +static void *gfs2_alloc_sort_buffer(unsigned size)
> > +{
> > +	void *ptr = NULL;
> > +
> > +	if (size < KMALLOC_MAX_SIZE)
> > +		ptr = kmalloc(size, GFP_NOFS | __GFP_NOWARN);
> > +	if (!ptr)
> > +		ptr = __vmalloc(size, GFP_NOFS, PAGE_KERNEL);
> > +	return ptr;
> > +}
> > +
> > +static void gfs2_free_sort_buffer(void *ptr)
> > +{
> > +	if (is_vmalloc_addr(ptr))
> > +		vfree(ptr);
> > +	else
> > +		kfree(ptr);
> > +}
> 
> This got kicked around a bit in May (Subject: mm: generic adaptive
> large memory allocation APIs).  That patch tried kmalloc(), then
> alloc_pages(), then vmalloc().  Nothing got merged though.
> 
> I wasn't terribly excited about it because of vague fears that it would
> just incite people to spend even less effort thinking about how large
> their individual allocations are.
> 
> apparmor has a private kvfree/kvmalloc.  Probably there are other
> versions floating around the tree as well.
> 

I did wonder about that too, but bearing in mind that the longer term
plan is to eliminate the vmalloc call, I wasn't sure it was worth doing,

Steve.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2)
  2010-07-28 17:13     ` Andrew Morton
  2010-07-28 17:51       ` Steven Whitehouse
@ 2010-07-29 17:58       ` Joel Becker
  1 sibling, 0 replies; 7+ messages in thread
From: Joel Becker @ 2010-07-29 17:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Jul 28, 2010 at 10:13:27AM -0700, Andrew Morton wrote:
> On Wed, 28 Jul 2010 17:56:23 +0100 Steven Whitehouse <swhiteho@redhat.com> wrote:
> > +static void gfs2_free_sort_buffer(void *ptr)
> > +{
> > +	if (is_vmalloc_addr(ptr))
> > +		vfree(ptr);
> > +	else
> > +		kfree(ptr);
> > +}
> 
> This got kicked around a bit in May (Subject: mm: generic adaptive
> large memory allocation APIs).  That patch tried kmalloc(), then
> alloc_pages(), then vmalloc().  Nothing got merged though.
> 
> I wasn't terribly excited about it because of vague fears that it would
> just incite people to spend even less effort thinking about how large
> their individual allocations are.

	I think these sort of things belong closer to the using code.
Like gfs2's declaration that "this is how we do our sort buffers," it is
a decision best left to the caller.

Joel

-- 

"If you are ever in doubt as to whether or not to kiss a pretty girl, 
 give her the benefit of the doubt"
                                        -Thomas Carlyle

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-29 17:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-28 11:15 [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() Steven Whitehouse
2010-07-28 15:39 ` Linus Torvalds
2010-07-28 16:52   ` Steven Whitehouse
2010-07-28 16:56   ` [Cluster-devel] GFS2: Use kmalloc when possible for ->readdir() (try #2) Steven Whitehouse
2010-07-28 17:13     ` Andrew Morton
2010-07-28 17:51       ` Steven Whitehouse
2010-07-29 17:58       ` Joel Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).