Re: [lustre-devel] LIBCFS_ALLOC

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Dilger, Andreas" <andreas.dilger@intel.com>
To: "Simmons, James A." <simmonsja@ornl.gov>,
	"'Julia Lawall'" <julia.lawall@lip6.fr>
Cc: "devel@driverdev.osuosl.org" <devel@driverdev.osuosl.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"kernel-janitors@vger.kernel.org"
	<kernel-janitors@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Drokin, Oleg" <oleg.drokin@intel.com>,
	"'Dan Carpenter'" <dan.carpenter@oracle.com>,
	"lustre-devel@lists.lustre.org" <lustre-devel@lists.lustre.org>
Subject: Re: [lustre-devel] LIBCFS_ALLOC
Date: Fri, 3 Jul 2015 11:52:06 +0000	[thread overview]
Message-ID: <D1BBD1A4.FB638%andreas.dilger@intel.com> (raw)
In-Reply-To: <9cb85b423527448db721927a35974317@EXCHCS32.ornl.gov>

On 2015/07/02, 4:25 PM, "Simmons, James A." <simmonsja@ornl.gov> wrote:

>
>>> >Yeah.  You're right.  Doing a vmalloc() when kmalloc() doesn't have
>>>even
>>> >a tiny sliver of RAM isn't going to work.  It's easier to use
>>> >libcfs_kvzalloc() everywhere, but it's probably the wrong thing.
>>>
>>> The original  reason we have the vmalloc water mark wasn't so much the
>>> issue of memory exhaustion but to handle the case of memory
>>>fragmentation.
>>> Some sites had after a extended period of time started to see failures
>>>of
>>> allocating even 32K using kmalloc.  In our latest development branch
>>>we moved
>>> away from using a water mark to always try kmalloc first and if it
>>>fails then we
>>> try vmalloc. At ORNL we ran into severe performance issues when we
>>>entered
>>> vmalloc territory. It has been discussed before on what might replace
>>>vmalloc
>>> handling in the case of kmalloc fails but no solution has been worked
>>>out.
>>
>>OK, but if a structure contains only 4 words, would it be better to just
>>use kzalloc?  Or does it not matter?  It would only save trying vmalloc
>>in
>>a case that it is guaranteed to fail, but if a structure with 4 words
>>can't be allocated, the system has other problems.  Another argument is
>>that kzalloc is a well known function that people and bug-finding tools
>>understand, so it is better to use it whenever possible.
>>
>>Some of the other structures contain a lot more fields, as well as small
>>arrays.  They are probably acceptable for kzalloc too, but I wouldn't
>>know
>>the exact dividing line.
>
>The reason I bring this up is to discuss sorting this out. Once long ago
>we had just LIBCFS_ALLOC. For some reason before my time OBD_ALLOC got
> spawned off of that.  Currently LIBCFS_ALLOC is used just by the
>libcfs/LNet
> layer.

That is because there was (is?) interest from Cray and others to use LNet
independently from Lustre (Zest and DVS, for example) so LNet should be
self-contained and not depend on anything from Lustre.

> Now OBD_ALLOC in our development branch has moved to a try kmalloc first
>and
>if it fails try vmalloc for any size memory allocation.  LIBCFS_ALLOC
>still
> does the original approach.   So we have two possible solutions
>depending on if libcfs/LNet needs to ever do a vmalloc.
>
>One solution if libcfs/LNet never needs a vmalloc is remove LIBCFS_ALLOC
>and replace it with kzalloc everywhere. We can then move libcfs_kzvalloc
>to
> the lustre layer and port the change we did in the development branch to
> here of the try kmalloc then vmalloc approach.
>
>The other approach is if libcfs/LNet does in some case need to use vmalloc
> we could then update LIBCFS_ALLOC to first try kmalloc then vmalloc. Once
> this is implemented we can nuke the OBD_ALLOC system.

I don't agree.  I think there are a few places where vmalloc() makes sense
to try (if the allocation may be large), but in most places LIBCFS_ALLOC()
should only use kmalloc().

Unfortunately, there wasn't a separate LIBCFS_ALLOC_LARGE() like there was
for OBD_ALLOC_LARGE() that made it clear which callsites are (potentially)
large and which are small.  The macro approach allowed the compile-time
optimization of the small callsites, but that needs to be done by hand now.

>Either way I like to see it consolidated down to one system.

Given the proliferation of foo_kvmalloc() and foo_kvzalloc() helpers
(ext4_, kvm_, dm_, apparmor, ceph_, __aa_), maybe it it is time to move
these to common kernel code instead of introducing yet another new one?

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

next prev parent reply	other threads:[~2015-07-03 11:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-20 16:58 [PATCH 00/12] Use !x to check for kzalloc failure Julia Lawall
2015-06-20 16:58 ` [PATCH 01/12] staging: lustre: fid: " Julia Lawall
2015-06-23  8:25   ` Dilger, Andreas
2015-06-23  9:23     ` Dan Carpenter
2015-06-23  9:35       ` Julia Lawall
2015-06-23  9:57         ` Dan Carpenter
2015-06-23 10:51           ` Julia Lawall
2015-06-24 20:14             ` [lustre-devel] " Simmons, James A.
2015-06-23 22:03           ` Joe Perches
2015-06-23 22:11       ` Joe Perches
2015-06-28  6:52     ` LIBCFS_ALLOC Julia Lawall
2015-06-28 21:54       ` LIBCFS_ALLOC Dan Carpenter
2015-06-30 14:56         ` LIBCFS_ALLOC Simmons, James A.
2015-06-30 15:01           ` LIBCFS_ALLOC Julia Lawall
2015-07-02 22:25             ` [lustre-devel] LIBCFS_ALLOC Simmons, James A.
2015-07-03 11:52               ` Dilger, Andreas [this message]
2015-06-30 17:38           ` LIBCFS_ALLOC Dan Carpenter
2015-06-30 21:26       ` [lustre-devel] LIBCFS_ALLOC Dilger, Andreas
2015-06-20 16:59 ` [PATCH 02/12] staging: lustre: fld: Use !x to check for kzalloc failure Julia Lawall
2015-06-20 16:59 ` [PATCH 03/12] staging: lustre: lclient: " Julia Lawall
2015-06-20 16:59 ` [PATCH 04/12] staging: lustre: ldlm: " Julia Lawall
2015-06-20 16:59 ` [PATCH 05/12] staging: lustre: lmv: " Julia Lawall
2015-06-20 16:59 ` [PATCH 06/12] staging: lustre: lov: " Julia Lawall
2015-06-20 16:59 ` [PATCH 07/12] staging: lustre: mdc: " Julia Lawall
2015-06-20 16:59 ` [PATCH 08/12] staging: lustre: mgc: " Julia Lawall
2015-06-20 16:59 ` [PATCH 09/12] staging: lustre: obdclass: " Julia Lawall
2015-06-21 10:02   ` walter harms
2015-06-21 10:29     ` Julia Lawall
2015-06-20 16:59 ` [PATCH 10/12] staging: lustre: obdecho: " Julia Lawall
2015-06-20 16:59 ` [PATCH 11/12] staging: lustre: osc: " Julia Lawall
2015-06-20 16:59 ` [PATCH 12/12] staging: lustre: ptlrpc: " Julia Lawall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D1BBD1A4.FB638%andreas.dilger@intel.com \
    --to=andreas.dilger@intel.com \
    --cc=dan.carpenter@oracle.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=julia.lawall@lip6.fr \
    --cc=kernel-janitors@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=oleg.drokin@intel.com \
    --cc=simmonsja@ornl.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox