From: Aaron Lu <aaron.lu@intel.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Vasily Averin <vvs@virtuozzo.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Huang Ying <ying.huang@intel.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm: use kvzalloc for swap_info_struct allocation
Date: Mon, 5 Nov 2018 22:27:27 +0800 [thread overview]
Message-ID: <20181105142727.GB6203@intel.com> (raw)
In-Reply-To: <20181105141156.GB10132@dhcp22.suse.cz>
On Mon, Nov 05, 2018 at 03:11:56PM +0100, Michal Hocko wrote:
> On Mon 05-11-18 14:17:01, Vasily Averin wrote:
> > commit a2468cc9bfdf ("swap: choose swap device according to numa node")
> > changed 'avail_lists' field of 'struct swap_info_struct' to an array.
> > In popular linux distros it increased size of swap_info_struct up to
> > 40 Kbytes and now swap_info_struct allocation requires order-4 page.
> > Switch to kvzmalloc allows to avoid unexpected allocation failures.
>
> While this fixes the most visible issue is this a good long term
> solution? Aren't we wasting memory without a good reason? IIRC our limit
That's right, we need a better way of handling this in the long term.
> for swap files/devices is much smaller than potential NUMA nodes numbers
> so we can safely expect that would be only few numa affine nodes. I am
> not really familiar with the rework which has added numa node awareness
> but I wouls assueme that we should either go with one global table with
> a linked list of possible swap_info structure per numa node or use a
> sparse array.
There is a per-numa-node plist of available swap devices, so every swap
device needs an entry on those per-numa-node plist.
I think we can convert avail_lists from array to pointer and use vzalloc
to allocate the needed memory. MAX_NUMANODES can be used for a simple
implementation, or use the precise online node number but then we will
need to handle node online/offline events.
sparse array sounds promising, I'll take a look, thanks for the pointer.
> That being said I am not really objecting to this patch as it is simple
> and backportable to older (stable kernels).
>
> I would even dare to add
> Fixes: a2468cc9bfdf ("swap: choose swap device according to numa node")
>
> because not being able to add a swap space on a fragmented system looks
> like a regression to me.
Agree, especially it used to work.
Regards,
Aaron
> > Acked-by: Aaron Lu <aaron.lu@intel.com>
> > Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
>
> Acked-by: Michal Hocko <mhocko@suse.com>
> > ---
> > mm/swapfile.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 644f746e167a..8688ae65ef58 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -2813,7 +2813,7 @@ static struct swap_info_struct *alloc_swap_info(void)
> > unsigned int type;
> > int i;
> >
> > - p = kzalloc(sizeof(*p), GFP_KERNEL);
> > + p = kvzalloc(sizeof(*p), GFP_KERNEL);
> > if (!p)
> > return ERR_PTR(-ENOMEM);
> >
> > @@ -2824,7 +2824,7 @@ static struct swap_info_struct *alloc_swap_info(void)
> > }
> > if (type >= MAX_SWAPFILES) {
> > spin_unlock(&swap_lock);
> > - kfree(p);
> > + kvfree(p);
> > return ERR_PTR(-EPERM);
> > }
> > if (type >= nr_swapfiles) {
> > @@ -2838,7 +2838,7 @@ static struct swap_info_struct *alloc_swap_info(void)
> > smp_wmb();
> > nr_swapfiles++;
> > } else {
> > - kfree(p);
> > + kvfree(p);
> > p = swap_info[type];
> > /*
> > * Do not memset this entry: a racing procfs swap_next()
> > --
> > 2.17.1
>
> --
> Michal Hocko
> SUSE Labs
>
prev parent reply other threads:[~2018-11-05 14:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-04 22:13 [PATCH 1/2] mm: use kvzalloc for swap_info_struct allocation Vasily Averin
2018-11-05 0:50 ` Huang, Ying
2018-11-05 0:50 ` Huang, Ying
2018-11-05 4:59 ` Vasily Averin
2018-11-05 5:16 ` Huang, Ying
2018-11-05 5:16 ` Huang, Ying
2018-11-05 6:10 ` Aaron Lu
2018-11-05 11:17 ` [PATCH v2] " Vasily Averin
2018-11-05 14:11 ` Michal Hocko
2018-11-05 14:27 ` Aaron Lu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181105142727.GB6203@intel.com \
--to=aaron.lu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vvs@virtuozzo.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.