From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752162AbbGMV1T (ORCPT ); Mon, 13 Jul 2015 17:27:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37816 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751507AbbGMV1S (ORCPT ); Mon, 13 Jul 2015 17:27:18 -0400 Date: Mon, 13 Jul 2015 23:25:36 +0200 From: Oleg Nesterov To: Al Viro , Jan Kara , Linus Torvalds , Paul McKenney , Peter Zijlstra Cc: Daniel Wagner , Davidlohr Bueso , Ingo Molnar , Tejun Heo , linux-kernel@vger.kernel.org Subject: [PATCH RFC 0/4] change sb_writers to use percpu_rw_semaphore Message-ID: <20150713212536.GA13855@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Al, Jan, could you comment? I mean the intent, the patches are obviously not for inclusion yet. We can remove everything from struct sb_writers except frozen (which can become a boolean, it seems) and add the array of percpu_rw_semaphore's instead. __sb_start/end_write() can use percpu_down/up_read(), and freeze/thaw_super() can use percpu_down/up_write(). Why: - Firstly, __sb_start_write() looks simply buggy. I does __sb_end_write() if it sees ->frozen, but if it migrates to another CPU before percpu_counter_dec() sb_wait_write() can wrongly succeed if there is another task which holds the same "semaphore": sb_wait_write() can miss the result of the previous percpu_counter_inc() but see the result of this percpu_counter_dec(). - This code doesn't look simple. It would be better to rely on the generic locking code. - __sb_start_write() will be a little bit faster, but this is minor. Todo: - __sb_start_write(wait => false) always fail. Thivial, we already have percpu_down_read_trylock() just this patch wasn't merged yet. - sb_lockdep_release() and sb_lockdep_acquire() play with percpu_rw_semaphore's internals. Trivial, we need a couple of new helper in percpu-rwsem.c. - Fix get_super_thawed(), it will spin if MS_RDONLY... It is not clear to me what exactly should we do, but this doesn't look hard. Perhaps it can just return if MS_RDONLY. - Most probably I missed something else, and I do not need how to test. Finally. freeze_super() calls synchronize_sched_expedited() 3 times in a row. This is bad and just stupid. But if we change percpu_rw_semaphore to use rcu_sync (see https://lkml.org/lkml/2015/7/11/211) we can avoid this and do synchronize_sched() only once. Just we need some more simple changes in percpu-rwsem.c, so that all sb_writers->rw_sem[] semaphores could use the single sb_writers->rss. In this case destroy_super() needs some modifications too, percpu_free_rwsem() will be might_sleep(). But this looks simple too. Oleg. fs/super.c | 147 +++++++++++++++++++-------------------------------- include/linux/fs.h | 14 +---- 2 files changed, 58 insertions(+), 103 deletions(-)