From: "Dilger, Andreas" <andreas.dilger@intel.com>
To: "Simmons, James A." <simmonsja@ornl.gov>,
"'Julia Lawall'" <julia.lawall@lip6.fr>
Cc: "devel@driverdev.osuosl.org" <devel@driverdev.osuosl.org>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"kernel-janitors@vger.kernel.org"
<kernel-janitors@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Drokin, Oleg" <oleg.drokin@intel.com>,
"'Dan Carpenter'" <dan.carpenter@oracle.com>,
"lustre-devel@lists.lustre.org" <lustre-devel@lists.lustre.org>
Subject: Re: [lustre-devel] LIBCFS_ALLOC
Date: Fri, 3 Jul 2015 11:52:06 +0000 [thread overview]
Message-ID: <D1BBD1A4.FB638%andreas.dilger@intel.com> (raw)
In-Reply-To: <9cb85b423527448db721927a35974317@EXCHCS32.ornl.gov>
On 2015/07/02, 4:25 PM, "Simmons, James A." <simmonsja@ornl.gov> wrote:
>
>>> >Yeah. You're right. Doing a vmalloc() when kmalloc() doesn't have
>>>even
>>> >a tiny sliver of RAM isn't going to work. It's easier to use
>>> >libcfs_kvzalloc() everywhere, but it's probably the wrong thing.
>>>
>>> The original reason we have the vmalloc water mark wasn't so much the
>>> issue of memory exhaustion but to handle the case of memory
>>>fragmentation.
>>> Some sites had after a extended period of time started to see failures
>>>of
>>> allocating even 32K using kmalloc. In our latest development branch
>>>we moved
>>> away from using a water mark to always try kmalloc first and if it
>>>fails then we
>>> try vmalloc. At ORNL we ran into severe performance issues when we
>>>entered
>>> vmalloc territory. It has been discussed before on what might replace
>>>vmalloc
>>> handling in the case of kmalloc fails but no solution has been worked
>>>out.
>>
>>OK, but if a structure contains only 4 words, would it be better to just
>>use kzalloc? Or does it not matter? It would only save trying vmalloc
>>in
>>a case that it is guaranteed to fail, but if a structure with 4 words
>>can't be allocated, the system has other problems. Another argument is
>>that kzalloc is a well known function that people and bug-finding tools
>>understand, so it is better to use it whenever possible.
>>
>>Some of the other structures contain a lot more fields, as well as small
>>arrays. They are probably acceptable for kzalloc too, but I wouldn't
>>know
>>the exact dividing line.
>
>The reason I bring this up is to discuss sorting this out. Once long ago
>we had just LIBCFS_ALLOC. For some reason before my time OBD_ALLOC got
> spawned off of that. Currently LIBCFS_ALLOC is used just by the
>libcfs/LNet
> layer.
That is because there was (is?) interest from Cray and others to use LNet
independently from Lustre (Zest and DVS, for example) so LNet should be
self-contained and not depend on anything from Lustre.
> Now OBD_ALLOC in our development branch has moved to a try kmalloc first
>and
>if it fails try vmalloc for any size memory allocation. LIBCFS_ALLOC
>still
> does the original approach. So we have two possible solutions
>depending on if libcfs/LNet needs to ever do a vmalloc.
>
>One solution if libcfs/LNet never needs a vmalloc is remove LIBCFS_ALLOC
>and replace it with kzalloc everywhere. We can then move libcfs_kzvalloc
>to
> the lustre layer and port the change we did in the development branch to
> here of the try kmalloc then vmalloc approach.
>
>The other approach is if libcfs/LNet does in some case need to use vmalloc
> we could then update LIBCFS_ALLOC to first try kmalloc then vmalloc. Once
> this is implemented we can nuke the OBD_ALLOC system.
I don't agree. I think there are a few places where vmalloc() makes sense
to try (if the allocation may be large), but in most places LIBCFS_ALLOC()
should only use kmalloc().
Unfortunately, there wasn't a separate LIBCFS_ALLOC_LARGE() like there was
for OBD_ALLOC_LARGE() that made it clear which callsites are (potentially)
large and which are small. The macro approach allowed the compile-time
optimization of the small callsites, but that needs to be done by hand now.
>Either way I like to see it consolidated down to one system.
Given the proliferation of foo_kvmalloc() and foo_kvzalloc() helpers
(ext4_, kvm_, dm_, apparmor, ceph_, __aa_), maybe it it is time to move
these to common kernel code instead of introducing yet another new one?
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
next prev parent reply other threads:[~2015-07-03 11:52 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-20 16:58 [PATCH 00/12] Use !x to check for kzalloc failure Julia Lawall
2015-06-20 16:58 ` [PATCH 01/12] staging: lustre: fid: " Julia Lawall
2015-06-23 8:25 ` Dilger, Andreas
2015-06-23 9:23 ` Dan Carpenter
2015-06-23 9:35 ` Julia Lawall
2015-06-23 9:57 ` Dan Carpenter
2015-06-23 10:51 ` Julia Lawall
2015-06-24 20:14 ` [lustre-devel] " Simmons, James A.
2015-06-23 22:03 ` Joe Perches
2015-06-23 22:11 ` Joe Perches
2015-06-28 6:52 ` LIBCFS_ALLOC Julia Lawall
2015-06-28 21:54 ` LIBCFS_ALLOC Dan Carpenter
2015-06-30 14:56 ` LIBCFS_ALLOC Simmons, James A.
2015-06-30 15:01 ` LIBCFS_ALLOC Julia Lawall
2015-07-02 22:25 ` [lustre-devel] LIBCFS_ALLOC Simmons, James A.
2015-07-03 11:52 ` Dilger, Andreas [this message]
2015-06-30 17:38 ` LIBCFS_ALLOC Dan Carpenter
2015-06-30 21:26 ` [lustre-devel] LIBCFS_ALLOC Dilger, Andreas
2015-06-20 16:59 ` [PATCH 02/12] staging: lustre: fld: Use !x to check for kzalloc failure Julia Lawall
2015-06-20 16:59 ` [PATCH 03/12] staging: lustre: lclient: " Julia Lawall
2015-06-20 16:59 ` [PATCH 04/12] staging: lustre: ldlm: " Julia Lawall
2015-06-20 16:59 ` [PATCH 05/12] staging: lustre: lmv: " Julia Lawall
2015-06-20 16:59 ` [PATCH 06/12] staging: lustre: lov: " Julia Lawall
2015-06-20 16:59 ` [PATCH 07/12] staging: lustre: mdc: " Julia Lawall
2015-06-20 16:59 ` [PATCH 08/12] staging: lustre: mgc: " Julia Lawall
2015-06-20 16:59 ` [PATCH 09/12] staging: lustre: obdclass: " Julia Lawall
2015-06-21 10:02 ` walter harms
2015-06-21 10:29 ` Julia Lawall
2015-06-20 16:59 ` [PATCH 10/12] staging: lustre: obdecho: " Julia Lawall
2015-06-20 16:59 ` [PATCH 11/12] staging: lustre: osc: " Julia Lawall
2015-06-20 16:59 ` [PATCH 12/12] staging: lustre: ptlrpc: " Julia Lawall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D1BBD1A4.FB638%andreas.dilger@intel.com \
--to=andreas.dilger@intel.com \
--cc=dan.carpenter@oracle.com \
--cc=devel@driverdev.osuosl.org \
--cc=gregkh@linuxfoundation.org \
--cc=julia.lawall@lip6.fr \
--cc=kernel-janitors@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lustre-devel@lists.lustre.org \
--cc=oleg.drokin@intel.com \
--cc=simmonsja@ornl.gov \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox