From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752468AbcDSG6G (ORCPT ); Tue, 19 Apr 2016 02:58:06 -0400 Received: from LGEAMRELO12.lge.com ([156.147.23.52]:50735 "EHLO lgeamrelo12.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752416AbcDSG6D (ORCPT ); Tue, 19 Apr 2016 02:58:03 -0400 X-Original-SENDERIP: 156.147.1.127 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.98.76 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Tue, 19 Apr 2016 15:58:43 +0900 From: Minchan Kim To: Ingo Molnar , Peter Zijlstra CC: Subject: preempt_count overflow in CONFIG_PREEMPT Message-ID: <20160419065843.GB12910@bbox> MIME-Version: 1.0 User-Agent: Mutt/1.5.21 (2010-09-15) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB03/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/04/19 15:57:41, Serialize by Router on LGEKRMHUB03/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/04/19 15:57:41, Serialize complete at 2016/04/19 15:57:41 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Ingo, Peter. I am implementing non-lru page migration and preparing v4 to resend. https://lkml.org/lkml/2016/3/30/56 Although design was changed from v3, my issue I will say from now on is still same so I think it's not hard to understand this problem with v3 although I didn't send v4 yet. :) My problem is zsmalloc part for supporting page migration. The zsmalloc stores several compressed pages in a page. Let's say compressed page as 'object'. If we are luck, we could store 113 objects(because minimum slot size is 36 byte) in a page. If a page has internal fragmentation, zsmalloc try to migrate a object from A page to B page. We call it as 'object migration'. To prevent access from user during the object migration, we uses spin lock in the atomic path to save memory space. Thus, it's object granularity so user can access other objects in the page. (Exactly speaking, it's not a spin_lock but owned-invented weired bit spin-lock with test_and_set_bit in while loop. I know it's bad buggy mess so I will change it with regular bit_spin_lock but the issue is still there). During object migration, the spin lock will be nested twice. One is source object, the othere is destination object. Let's return back to the issue. This time, not object but page migration, step is as follows. migration trial A page to B page. B is newly allocated page so it's empty. 1. freeze every objects in A page for object in a page bit_spin_lock(object) 2. memcpy(B, A, PAGE_SIZE); 3. unfreeze every objects in A page for object in a page bit_spin_unlock(object) 4. put_page(A); The logic is rather staightforward, I guess. :) Here, the problem is that unlike object migration, page migration needs to prevent all objects access in a page all at once before step 2. So, if we are luck, we can increase preempt_count as 113 every CPU so easily preempt_count_add emits spinlock count overflow in DEBUG_LOCKS_WARN_ON if we are multiple CPUs.(My machine is 12 CPU). I think there are several choices to fix it but I'm not sure what's the best so I want to hear your opinion. 1. increase preempt_count size? 2. support bit_spin_lock_no_preempt/bit_spin_unlock_no_preempt? 3. redesign zsmalloc page migration locking granularity? I want to avoid 3 if possible because such design will make code very complicated and may hurt scalabitity and performance, I guess. I guess 8bit for PREEMPT_BITS is too small for considering the number of CPUs in recent computer system? I hope I'm not alone to see this issue until now. :) Thanks.