From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f172.google.com (mail-ob0-f172.google.com [209.85.214.172]) by kanga.kvack.org (Postfix) with ESMTP id 38F066B0035 for ; Wed, 9 Jul 2014 12:53:48 -0400 (EDT) Received: by mail-ob0-f172.google.com with SMTP id uy5so8486879obc.31 for ; Wed, 09 Jul 2014 09:53:47 -0700 (PDT) Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [2607:f8b0:4003:c01::235]) by mx.google.com with ESMTPS id pz3si64456388oec.16.2014.07.09.09.53.46 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 09 Jul 2014 09:53:47 -0700 (PDT) Received: by mail-ob0-f181.google.com with SMTP id wp4so8535234obc.12 for ; Wed, 09 Jul 2014 09:53:46 -0700 (PDT) MIME-Version: 1.0 From: Eric Miao Date: Wed, 9 Jul 2014 09:53:26 -0700 Message-ID: Subject: Re: arm64 flushing 255GB of vmalloc space takes too long Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott Cc: "linux-arm-kernel@lists.infradead.org" , Linux Memory Management List , Catalin Marinas , Will Deacon , Russell King On Tue, Jul 8, 2014 at 6:43 PM, Laura Abbott wrote: > > Hi, > > I have an arm64 target which has been observed hanging in __purge_vmap_area_lazy > in vmalloc.c The root cause of this 'hang' is that flush_tlb_kernel_range is > attempting to flush 255GB of virtual address space. This takes ~2 seconds and > preemption is disabled at this time thanks to the purge lock. Disabling > preemption for that time is long enough to trigger a watchdog we have setup. > > Triggering this is fairly easy: > 1) Early in bootup, vmalloc > lazy_max_pages. This gives an address near the > start of the vmalloc range. > 2) load a module > 3) vfree the vmalloc region from step 1 > 4) unload the module > > The arm64 virtual address layout looks like > vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000 (245759 MB) > vmemmap : 0xffffffbc02400000 - 0xffffffbc03600000 ( 18 MB) > modules : 0xffffffbffc000000 - 0xffffffc000000000 ( 64 MB) > > and the algorithm in __purge_vmap_area_lazy flushes between the lowest address. > Essentially, if we are using a reasonable amount of vmalloc space and a module > unload triggers a vmalloc purge, we will end up triggering our watchdog. > > A couple of options I thought of: > 1) Increase the timeout of our watchdog to allow the flush to occur. Nobody > I suggested this to likes the idea as the watchdog firing generally catches > behavior that results in poor system performance and disabling preemption > for that long does seem like a problem. > 2) Change __purge_vmap_area_lazy to do less work under a spinlock. This would > certainly have a performance impact and I don't even know if it is plausible. > 3) Allow module unloading to trigger a vmalloc purge beforehand to help avoid > this case. This would still be racy if another vfree came in during the time > between the purge and the vfree but it might be good enough. > 4) Add 'if size > threshold flush entire tlb' (I haven't profiled this yet) We have the same problem. I'd agree with point 2 and point 4, point 1/3 do not actually fix this issue. purge_vmap_area_lazy() could be called in other cases. w.r.t the threshold to flush entire tlb instead of doing that page-by-page, that could be different from platform to platform. And considering the cost of tlb flush on x86, I wonder why this isn't an issue on x86. The whole __purge_vmap_area_lazy() is protected by a single spinlock, I see no reason why a mutex cannot be used there, this allows preemption during this likely lengthy process. The rbtree removal seems to be heavy too - worst case would be to call __free_vmap_area() for lazy_max_pages times. And they are all protected by a single spinlock for the whole traversal, which is not necessary. CC+ Russell, Catalin, Will. We have a patch as below: ============================ >8 =========================