Re: [PATCH 2/2] mm/vmalloc: rework the drain logic

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Huang\, Ying" <ying.huang@intel.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: huang ying <huang.ying.caritas@gmail.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 linux-mm@kvack.org,  LKML <linux-kernel@vger.kernel.org>,
	 Hillf Danton <hdanton@sina.com>,  Michal Hocko <mhocko@suse.com>,
	 Matthew Wilcox <willy@infradead.org>,
	 Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	 Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 2/2] mm/vmalloc: rework the drain logic
Date: Thu, 19 Nov 2020 09:40:29 +0800	[thread overview]
Message-ID: <87mtzeunsi.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20201118161623.GA21171@pc636> (Uladzislau Rezki's message of "Wed, 18 Nov 2020 17:16:23 +0100")

Uladzislau Rezki <urezki@gmail.com> writes:

> On Wed, Nov 18, 2020 at 10:44:13AM +0800, huang ying wrote:
>> On Tue, Nov 17, 2020 at 9:04 PM Uladzislau Rezki <urezki@gmail.com> wrote:
>> >
>> > On Tue, Nov 17, 2020 at 10:37:34AM +0800, huang ying wrote:
>> > > On Tue, Nov 17, 2020 at 6:00 AM Uladzislau Rezki (Sony)
>> > > <urezki@gmail.com> wrote:
>> > > >
>> > > > A current "lazy drain" model suffers from at least two issues.
>> > > >
>> > > > First one is related to the unsorted list of vmap areas, thus
>> > > > in order to identify the [min:max] range of areas to be drained,
>> > > > it requires a full list scan. What is a time consuming if the
>> > > > list is too long.
>> > > >
>> > > > Second one and as a next step is about merging all fragments
>> > > > with a free space. What is also a time consuming because it
>> > > > has to iterate over entire list which holds outstanding lazy
>> > > > areas.
>> > > >
>> > > > See below the "preemptirqsoff" tracer that illustrates a high
>> > > > latency. It is ~24 676us. Our workloads like audio and video
>> > > > are effected by such long latency:
>> > >
>> > > This seems like a real problem.  But I found there's long latency
>> > > avoidance mechanism in the loop in __purge_vmap_area_lazy() as
>> > > follows,
>> > >
>> > >         if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
>> > >             cond_resched_lock(&free_vmap_area_lock);
>> > >
>> > I have added that "resched threshold" because of on my tests i could
>> > simply hit out of memory, due to the fact that a drain work is not up
>> > to speed to process such long outstanding list of vmap areas.
>> 
>> OK.  Now I think I understand the problem.  For free area purging,
>> there are multiple "producers" but one "consumer", and it lacks enough
>> mechanism to slow down the "producers" if "consumer" can not catch up.
>> And your patch tries to resolve the problem via accelerating the
>> "consumer".
>>
> Seems, correct. But just in case one more time:
>
> the cond_resched_lock was added once upon a time to get rid of long
> preemption off time. Due to dropping the lock, "producers" can start
> generate further vmap area, so "consumer" can not catch up. Seems

Yes.  And in theory there are vfree() storm, that is, thousands vfree()
can be called in short time.  But I don't think that's practical use
case.

> Later on, a resched threshold was added. It is just a simple protection
> threshold, passing which, a freeing is prioritized back over allocation,
> so we guarantee that we do not hit out of memory.

Yes.  That can accelerate freeing if necessary.

>>
>> That isn't perfect, but I think we may have quite some opportunities
>> to merge the free areas, so it should just work.
>> 
> Yes, merging opportunity should do the work. But of course there are
> exceptions.
>
>> And I found the long latency avoidance logic in
>> __purge_vmap_area_lazy() appears problematic,
>> 
>>          if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
>>              cond_resched_lock(&free_vmap_area_lock);
>> 
>> Shouldn't it be something as follows?
>> 
>>          if (i >= BATCH && atomic_long_read(&vmap_lazy_nr) <
>> resched_threshold) {
>>              cond_resched_lock(&free_vmap_area_lock);
>>              i = 0;
>>          } else
>>              i++;
>> 
>> This will accelerate the purging via batching and slow down vmalloc()
>> via holding free_vmap_area_lock.  If it makes sense, can we try this?
>> 
> Probably we can switch to just using "batch" methodology:
>
> <snip>
>     if (!(i++ % batch_threshold))
>         cond_resched_lock(&free_vmap_area_lock);
> <snip>

That's the typical long latency avoidance method.

> The question is, which value we should use as a batch_threshold: 100, 1000, etc.

I think we can do some measurement to determine it?

> Apart of it and in regard to CONFIG_KASAN_VMALLOC, it seems that we are not
> allowed to drop the free_vmap_area_lock at all. Because any simultaneous
> allocations are not allowed within a drain region, so it should occur in
> disjoint regions. But i need to double check it.
>
>>
>> And, can we reduce lazy_max_pages() to control the length of the
>> purging list?  It could be > 8K if the vmalloc/vfree size is small.
>>
> We can adjust it for sure. But it will influence on number of global
> TLB flushes that must be performed.

Em...  For example, if we set it to 100, then the number of the TLB
flushes can be reduced to 1% of the un-optimized implementation
already.  Do you think so?

Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2020-11-19  1:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-16 22:00 [PATCH 1/2] mm/vmalloc: use free_vm_area() if an allocation fails Uladzislau Rezki (Sony)
2020-11-16 22:00 ` [PATCH 2/2] mm/vmalloc: rework the drain logic Uladzislau Rezki (Sony)
2020-11-17  2:37   ` huang ying
2020-11-17 13:04     ` Uladzislau Rezki
2020-11-18  2:44       ` huang ying
2020-11-18 16:16         ` Uladzislau Rezki
2020-11-19  1:40           ` Huang, Ying [this message]
2020-11-19 17:36             ` Uladzislau Rezki
2020-11-20  2:34               ` Huang, Ying
2020-11-23 13:59                 ` Uladzislau Rezki
2020-11-24  2:25                   ` Huang, Ying
2020-11-24 16:40                     ` Uladzislau Rezki
2020-11-25  0:52                       ` Huang, Ying
2020-11-25 20:34                         ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mtzeunsi.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=rostedt@goodmis.org \
    --cc=urezki@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.