All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dilger, Andreas" <andreas.dilger@intel.com>
To: "Simmons, James A." <simmonsja@ornl.gov>,
	'Julia Lawall' <julia.lawall@lip6.fr>
Cc: "devel@driverdev.osuosl.org" <devel@driverdev.osuosl.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"kernel-janitors@vger.kernel.org"
	<kernel-janitors@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Drokin, Oleg" <oleg.drokin@intel.com>,
	'Dan Carpenter' <dan.carpenter@oracle.com>,
	"lustre-devel@lists.lustre.org" <lustre-devel@lists.lustre.org>
Subject: Re: [lustre-devel] LIBCFS_ALLOC
Date: Fri, 03 Jul 2015 11:52:06 +0000	[thread overview]
Message-ID: <D1BBD1A4.FB638%andreas.dilger@intel.com> (raw)
In-Reply-To: <9cb85b423527448db721927a35974317@EXCHCS32.ornl.gov>

On 2015/07/02, 4:25 PM, "Simmons, James A." <simmonsja@ornl.gov> wrote:

>
>>> >Yeah.  You're right.  Doing a vmalloc() when kmalloc() doesn't have
>>>even
>>> >a tiny sliver of RAM isn't going to work.  It's easier to use
>>> >libcfs_kvzalloc() everywhere, but it's probably the wrong thing.
>>>
>>> The original  reason we have the vmalloc water mark wasn't so much the
>>> issue of memory exhaustion but to handle the case of memory
>>>fragmentation.
>>> Some sites had after a extended period of time started to see failures
>>>of
>>> allocating even 32K using kmalloc.  In our latest development branch
>>>we moved
>>> away from using a water mark to always try kmalloc first and if it
>>>fails then we
>>> try vmalloc. At ORNL we ran into severe performance issues when we
>>>entered
>>> vmalloc territory. It has been discussed before on what might replace
>>>vmalloc
>>> handling in the case of kmalloc fails but no solution has been worked
>>>out.
>>
>>OK, but if a structure contains only 4 words, would it be better to just
>>use kzalloc?  Or does it not matter?  It would only save trying vmalloc
>>in
>>a case that it is guaranteed to fail, but if a structure with 4 words
>>can't be allocated, the system has other problems.  Another argument is
>>that kzalloc is a well known function that people and bug-finding tools
>>understand, so it is better to use it whenever possible.
>>
>>Some of the other structures contain a lot more fields, as well as small
>>arrays.  They are probably acceptable for kzalloc too, but I wouldn't
>>know
>>the exact dividing line.
>
>The reason I bring this up is to discuss sorting this out. Once long ago
>we had just LIBCFS_ALLOC. For some reason before my time OBD_ALLOC got
> spawned off of that.  Currently LIBCFS_ALLOC is used just by the
>libcfs/LNet
> layer.

That is because there was (is?) interest from Cray and others to use LNet
independently from Lustre (Zest and DVS, for example) so LNet should be
self-contained and not depend on anything from Lustre.

> Now OBD_ALLOC in our development branch has moved to a try kmalloc first
>and
>if it fails try vmalloc for any size memory allocation.  LIBCFS_ALLOC
>still
> does the original approach.   So we have two possible solutions
>depending on if libcfs/LNet needs to ever do a vmalloc.
>
>One solution if libcfs/LNet never needs a vmalloc is remove LIBCFS_ALLOC
>and replace it with kzalloc everywhere. We can then move libcfs_kzvalloc
>to
> the lustre layer and port the change we did in the development branch to
> here of the try kmalloc then vmalloc approach.
>
>The other approach is if libcfs/LNet does in some case need to use vmalloc
> we could then update LIBCFS_ALLOC to first try kmalloc then vmalloc. Once
> this is implemented we can nuke the OBD_ALLOC system.

I don't agree.  I think there are a few places where vmalloc() makes sense
to try (if the allocation may be large), but in most places LIBCFS_ALLOC()
should only use kmalloc().

Unfortunately, there wasn't a separate LIBCFS_ALLOC_LARGE() like there was
for OBD_ALLOC_LARGE() that made it clear which callsites are (potentially)
large and which are small.  The macro approach allowed the compile-time
optimization of the small callsites, but that needs to be done by hand now.

>Either way I like to see it consolidated down to one system.

Given the proliferation of foo_kvmalloc() and foo_kvzalloc() helpers
(ext4_, kvm_, dm_, apparmor, ceph_, __aa_), maybe it it is time to move
these to common kernel code instead of introducing yet another new one?

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division



WARNING: multiple messages have this Message-ID (diff)
From: Dilger, Andreas <andreas.dilger@intel.com>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] LIBCFS_ALLOC
Date: Fri, 3 Jul 2015 11:52:06 +0000	[thread overview]
Message-ID: <D1BBD1A4.FB638%andreas.dilger@intel.com> (raw)
In-Reply-To: <9cb85b423527448db721927a35974317@EXCHCS32.ornl.gov>

On 2015/07/02, 4:25 PM, "Simmons, James A." <simmonsja@ornl.gov> wrote:

>
>>> >Yeah.  You're right.  Doing a vmalloc() when kmalloc() doesn't have
>>>even
>>> >a tiny sliver of RAM isn't going to work.  It's easier to use
>>> >libcfs_kvzalloc() everywhere, but it's probably the wrong thing.
>>>
>>> The original  reason we have the vmalloc water mark wasn't so much the
>>> issue of memory exhaustion but to handle the case of memory
>>>fragmentation.
>>> Some sites had after a extended period of time started to see failures
>>>of
>>> allocating even 32K using kmalloc.  In our latest development branch
>>>we moved
>>> away from using a water mark to always try kmalloc first and if it
>>>fails then we
>>> try vmalloc. At ORNL we ran into severe performance issues when we
>>>entered
>>> vmalloc territory. It has been discussed before on what might replace
>>>vmalloc
>>> handling in the case of kmalloc fails but no solution has been worked
>>>out.
>>
>>OK, but if a structure contains only 4 words, would it be better to just
>>use kzalloc?  Or does it not matter?  It would only save trying vmalloc
>>in
>>a case that it is guaranteed to fail, but if a structure with 4 words
>>can't be allocated, the system has other problems.  Another argument is
>>that kzalloc is a well known function that people and bug-finding tools
>>understand, so it is better to use it whenever possible.
>>
>>Some of the other structures contain a lot more fields, as well as small
>>arrays.  They are probably acceptable for kzalloc too, but I wouldn't
>>know
>>the exact dividing line.
>
>The reason I bring this up is to discuss sorting this out. Once long ago
>we had just LIBCFS_ALLOC. For some reason before my time OBD_ALLOC got
> spawned off of that.  Currently LIBCFS_ALLOC is used just by the
>libcfs/LNet
> layer.

That is because there was (is?) interest from Cray and others to use LNet
independently from Lustre (Zest and DVS, for example) so LNet should be
self-contained and not depend on anything from Lustre.

> Now OBD_ALLOC in our development branch has moved to a try kmalloc first
>and
>if it fails try vmalloc for any size memory allocation.  LIBCFS_ALLOC
>still
> does the original approach.   So we have two possible solutions
>depending on if libcfs/LNet needs to ever do a vmalloc.
>
>One solution if libcfs/LNet never needs a vmalloc is remove LIBCFS_ALLOC
>and replace it with kzalloc everywhere. We can then move libcfs_kzvalloc
>to
> the lustre layer and port the change we did in the development branch to
> here of the try kmalloc then vmalloc approach.
>
>The other approach is if libcfs/LNet does in some case need to use vmalloc
> we could then update LIBCFS_ALLOC to first try kmalloc then vmalloc. Once
> this is implemented we can nuke the OBD_ALLOC system.

I don't agree.  I think there are a few places where vmalloc() makes sense
to try (if the allocation may be large), but in most places LIBCFS_ALLOC()
should only use kmalloc().

Unfortunately, there wasn't a separate LIBCFS_ALLOC_LARGE() like there was
for OBD_ALLOC_LARGE() that made it clear which callsites are (potentially)
large and which are small.  The macro approach allowed the compile-time
optimization of the small callsites, but that needs to be done by hand now.

>Either way I like to see it consolidated down to one system.

Given the proliferation of foo_kvmalloc() and foo_kvzalloc() helpers
(ext4_, kvm_, dm_, apparmor, ceph_, __aa_), maybe it it is time to move
these to common kernel code instead of introducing yet another new one?

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

WARNING: multiple messages have this Message-ID (diff)
From: "Dilger, Andreas" <andreas.dilger@intel.com>
To: "Simmons, James A." <simmonsja@ornl.gov>,
	"'Julia Lawall'" <julia.lawall@lip6.fr>
Cc: "devel@driverdev.osuosl.org" <devel@driverdev.osuosl.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"kernel-janitors@vger.kernel.org"
	<kernel-janitors@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Drokin, Oleg" <oleg.drokin@intel.com>,
	"'Dan Carpenter'" <dan.carpenter@oracle.com>,
	"lustre-devel@lists.lustre.org" <lustre-devel@lists.lustre.org>
Subject: Re: [lustre-devel] LIBCFS_ALLOC
Date: Fri, 3 Jul 2015 11:52:06 +0000	[thread overview]
Message-ID: <D1BBD1A4.FB638%andreas.dilger@intel.com> (raw)
In-Reply-To: <9cb85b423527448db721927a35974317@EXCHCS32.ornl.gov>

On 2015/07/02, 4:25 PM, "Simmons, James A." <simmonsja@ornl.gov> wrote:

>
>>> >Yeah.  You're right.  Doing a vmalloc() when kmalloc() doesn't have
>>>even
>>> >a tiny sliver of RAM isn't going to work.  It's easier to use
>>> >libcfs_kvzalloc() everywhere, but it's probably the wrong thing.
>>>
>>> The original  reason we have the vmalloc water mark wasn't so much the
>>> issue of memory exhaustion but to handle the case of memory
>>>fragmentation.
>>> Some sites had after a extended period of time started to see failures
>>>of
>>> allocating even 32K using kmalloc.  In our latest development branch
>>>we moved
>>> away from using a water mark to always try kmalloc first and if it
>>>fails then we
>>> try vmalloc. At ORNL we ran into severe performance issues when we
>>>entered
>>> vmalloc territory. It has been discussed before on what might replace
>>>vmalloc
>>> handling in the case of kmalloc fails but no solution has been worked
>>>out.
>>
>>OK, but if a structure contains only 4 words, would it be better to just
>>use kzalloc?  Or does it not matter?  It would only save trying vmalloc
>>in
>>a case that it is guaranteed to fail, but if a structure with 4 words
>>can't be allocated, the system has other problems.  Another argument is
>>that kzalloc is a well known function that people and bug-finding tools
>>understand, so it is better to use it whenever possible.
>>
>>Some of the other structures contain a lot more fields, as well as small
>>arrays.  They are probably acceptable for kzalloc too, but I wouldn't
>>know
>>the exact dividing line.
>
>The reason I bring this up is to discuss sorting this out. Once long ago
>we had just LIBCFS_ALLOC. For some reason before my time OBD_ALLOC got
> spawned off of that.  Currently LIBCFS_ALLOC is used just by the
>libcfs/LNet
> layer.

That is because there was (is?) interest from Cray and others to use LNet
independently from Lustre (Zest and DVS, for example) so LNet should be
self-contained and not depend on anything from Lustre.

> Now OBD_ALLOC in our development branch has moved to a try kmalloc first
>and
>if it fails try vmalloc for any size memory allocation.  LIBCFS_ALLOC
>still
> does the original approach.   So we have two possible solutions
>depending on if libcfs/LNet needs to ever do a vmalloc.
>
>One solution if libcfs/LNet never needs a vmalloc is remove LIBCFS_ALLOC
>and replace it with kzalloc everywhere. We can then move libcfs_kzvalloc
>to
> the lustre layer and port the change we did in the development branch to
> here of the try kmalloc then vmalloc approach.
>
>The other approach is if libcfs/LNet does in some case need to use vmalloc
> we could then update LIBCFS_ALLOC to first try kmalloc then vmalloc. Once
> this is implemented we can nuke the OBD_ALLOC system.

I don't agree.  I think there are a few places where vmalloc() makes sense
to try (if the allocation may be large), but in most places LIBCFS_ALLOC()
should only use kmalloc().

Unfortunately, there wasn't a separate LIBCFS_ALLOC_LARGE() like there was
for OBD_ALLOC_LARGE() that made it clear which callsites are (potentially)
large and which are small.  The macro approach allowed the compile-time
optimization of the small callsites, but that needs to be done by hand now.

>Either way I like to see it consolidated down to one system.

Given the proliferation of foo_kvmalloc() and foo_kvzalloc() helpers
(ext4_, kvm_, dm_, apparmor, ceph_, __aa_), maybe it it is time to move
these to common kernel code instead of introducing yet another new one?

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division



  reply	other threads:[~2015-07-03 11:52 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-20 16:58 [PATCH 00/12] Use !x to check for kzalloc failure Julia Lawall
2015-06-20 16:58 ` Julia Lawall
2015-06-20 16:58 ` [PATCH 01/12] staging: lustre: fid: " Julia Lawall
2015-06-20 16:58   ` Julia Lawall
2015-06-22 15:46   ` [lustre-devel] Fwd: [HPDD-discuss] " Patrick Farrell
2015-06-22 17:18     ` Simmons, James A.
2015-06-23  8:05     ` Dilger, Andreas
2015-06-23  8:25   ` Dilger, Andreas
2015-06-23  8:25     ` Dilger, Andreas
2015-06-23  8:25     ` [lustre-devel] " Dilger, Andreas
2015-06-23  9:23     ` Dan Carpenter
2015-06-23  9:23       ` Dan Carpenter
2015-06-23  9:23       ` [lustre-devel] " Dan Carpenter
2015-06-23  9:35       ` Julia Lawall
2015-06-23  9:35         ` Julia Lawall
2015-06-23  9:35         ` [lustre-devel] " Julia Lawall
2015-06-23  9:57         ` Dan Carpenter
2015-06-23  9:57           ` Dan Carpenter
2015-06-23  9:57           ` [lustre-devel] " Dan Carpenter
2015-06-23 10:51           ` Julia Lawall
2015-06-23 10:51             ` Julia Lawall
2015-06-23 10:51             ` [lustre-devel] " Julia Lawall
2015-06-24 20:14             ` Simmons, James A.
2015-06-24 20:14               ` Simmons, James A.
2015-06-24 20:14               ` Simmons, James A.
2015-06-23 22:03           ` Joe Perches
2015-06-23 22:03             ` Joe Perches
2015-06-23 22:03             ` [lustre-devel] " Joe Perches
2015-06-23 22:11       ` Joe Perches
2015-06-23 22:11         ` Joe Perches
2015-06-23 22:11         ` [lustre-devel] " Joe Perches
2015-06-28  6:52     ` LIBCFS_ALLOC Julia Lawall
2015-06-28  6:52       ` LIBCFS_ALLOC Julia Lawall
2015-06-28  6:52       ` [lustre-devel] LIBCFS_ALLOC Julia Lawall
2015-06-28 21:54       ` LIBCFS_ALLOC Dan Carpenter
2015-06-28 21:54         ` LIBCFS_ALLOC Dan Carpenter
2015-06-28 21:54         ` [lustre-devel] LIBCFS_ALLOC Dan Carpenter
2015-06-30 14:56         ` LIBCFS_ALLOC Simmons, James A.
2015-06-30 14:56           ` [lustre-devel] LIBCFS_ALLOC Simmons, James A.
2015-06-30 15:01           ` LIBCFS_ALLOC Julia Lawall
2015-06-30 15:01             ` LIBCFS_ALLOC Julia Lawall
2015-06-30 15:01             ` [lustre-devel] LIBCFS_ALLOC Julia Lawall
2015-07-02 22:25             ` Simmons, James A.
2015-07-02 22:25               ` Simmons, James A.
2015-07-02 22:25               ` Simmons, James A.
2015-07-03 11:52               ` Dilger, Andreas [this message]
2015-07-03 11:52                 ` Dilger, Andreas
2015-07-03 11:52                 ` Dilger, Andreas
2015-06-30 17:38           ` LIBCFS_ALLOC Dan Carpenter
2015-06-30 17:38             ` LIBCFS_ALLOC Dan Carpenter
2015-06-30 17:38             ` [lustre-devel] LIBCFS_ALLOC Dan Carpenter
2015-06-30 21:26       ` Dilger, Andreas
2015-06-30 21:26         ` Dilger, Andreas
2015-06-20 16:59 ` [PATCH 02/12] staging: lustre: fld: Use !x to check for kzalloc failure Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 03/12] staging: lustre: lclient: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 04/12] staging: lustre: ldlm: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 05/12] staging: lustre: lmv: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 06/12] staging: lustre: lov: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 07/12] staging: lustre: mdc: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 08/12] staging: lustre: mgc: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 09/12] staging: lustre: obdclass: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-21 10:02   ` walter harms
2015-06-21 10:02     ` walter harms
2015-06-21 10:29     ` Julia Lawall
2015-06-21 10:29       ` Julia Lawall
2015-06-21 11:58   ` walter harms
2015-06-20 16:59 ` [PATCH 10/12] staging: lustre: obdecho: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 11/12] staging: lustre: osc: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall
2015-06-20 16:59 ` [PATCH 12/12] staging: lustre: ptlrpc: " Julia Lawall
2015-06-20 16:59   ` Julia Lawall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D1BBD1A4.FB638%andreas.dilger@intel.com \
    --to=andreas.dilger@intel.com \
    --cc=dan.carpenter@oracle.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=julia.lawall@lip6.fr \
    --cc=kernel-janitors@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=oleg.drokin@intel.com \
    --cc=simmonsja@ornl.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.