From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754145AbcDTOoh (ORCPT <rfc822;w@1wt.eu>);
	Wed, 20 Apr 2016 10:44:37 -0400
Received: from bombadil.infradead.org ([198.137.202.9]:49835 "EHLO
	bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751831AbcDTOog (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 20 Apr 2016 10:44:36 -0400
Date: Wed, 20 Apr 2016 16:44:33 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>, linux-kernel@vger.kernel.org
Subject: Re: preempt_count overflow in CONFIG_PREEMPT
Message-ID: <20160420144433.GG3430@twins.programming.kicks-ass.net>
References: <20160419065843.GB12910@bbox>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160419065843.GB12910@bbox>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Apr 19, 2016 at 03:58:43PM +0900, Minchan Kim wrote:
> migration trial A page to B page.
> B is newly allocated page so it's empty.
> 
> 1. freeze every objects in A page
>    for object in a page
>        bit_spin_lock(object)
>       
> 2. memcpy(B, A, PAGE_SIZE);
> 
> 3. unfreeze every objects in A page
>    for object in a page
>        bit_spin_unlock(object)
> 
> 4. put_page(A);
> 
> The logic is rather staightforward, I guess. :)
> Here, the problem is that unlike object migration, page migration
> needs to prevent all objects access in a page all at once before step 2.
> So, if we are luck, we can increase preempt_count as 113 every CPU so
> easily preempt_count_add emits spinlock count overflow in
> DEBUG_LOCKS_WARN_ON if we are multiple CPUs.(My machine is 12 CPU).
> 
> I think there are several choices to fix it but I'm not sure what's
> the best so I want to hear your opinion.
> 
> 1. increase preempt_count size?

Nope, 256 is way far too many locks to be holding, esp. spin-locks. You
get the most horrid latency spikes from that.

> 2. support bit_spin_lock_no_preempt/bit_spin_unlock_no_preempt?

Only if you really really really have to, but it would suck.

> 3. redesign zsmalloc page migration locking granularity?
> 
> I want to avoid 3 if possible because such design will make code
> very complicated and may hurt scalabitity and performance, I guess.

This really is your best option. You don't think O(nr_cpus) locking is a
scalability fail?

> I guess 8bit for PREEMPT_BITS is too small for considering the
> number of CPUs in recent computer system?

Not really. Holding a lock (or even multiple as you do) for each cpu is
a completely painful thing and doesn't scale.

> I hope I'm not alone to see this issue until now. :)

Very occasionally people run into this.. we try and convince them to
change their ways.