From mboxrd@z Thu Jan 1 00:00:00 1970 From: Qu Wenruo Subject: Re: [PATCH RFC v6 6/9] vfs: Add sb_want_write() function to get vfsmount from a given sb. Date: Wed, 4 Feb 2015 16:22:23 +0800 Message-ID: <54D1D6BF.805@cn.fujitsu.com> References: <1423015855-27357-1-git-send-email-quwenruo@cn.fujitsu.com> <54D1D3B4.9060209@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: To: Miao Xie , Return-path: Received: from cn.fujitsu.com ([59.151.112.132]:47877 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932350AbbBDIWZ convert rfc822-to-8bit (ORCPT ); Wed, 4 Feb 2015 03:22:25 -0500 In-Reply-To: <54D1D3B4.9060209@huawei.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: -------- Original Message -------- Subject: Re: [PATCH RFC v6 6/9] vfs: Add sb_want_write() function to ge= t=20 vfsmount from a given sb. =46rom: Miao Xie To: Qu Wenruo , Date: 2015=E5=B9=B402=E6=9C=8804=E6=97=A5 16:09 > On Wed, 04 Feb 2015 10:10:55 +0800, Qu Wenruo wrote: >> *** Please DON'T merge this patch, it's only for disscusion purpose = *** >> >> There are sysfs interfaces in some fs, only btrfs yet, which will mo= dify >> on-disk data. >> Unlike normal file operation routine we can use mnt_want_write_file(= ) to >> protect the operation, change through sysfs won't to be binded to an= y file >> in the filesystem. >> >> So introduce new sb_want_write() to do the protection agains a super >> block, which acts much like mnt_want_write() but will return success= if >> the super block is read-write. >> >> Since sysfs handler don't go through the normal vfsmount, so it won'= t >> increase the refcount of and even we have sb_want_write() waiting sb= to >> be unfrozen, the fs can still be unmounted without problem. >> Causing the modules unable to be removed and user can find out what'= s >> wrong until >> >> To solve such problem, we have different strategies to solve it. >> 1) Extra check on last instance umount of a sb >> This is the method the patch uses. >> This method seems valid enough, since we want to get write protectio= n on >> a sb, so it's OK for the sb if there is *ANY* mount instance. >> Problem 1.1) >> But lsof and other tools won't help if sb_want_write() on frozen fs = cause >> it unable to be unmounted. >> >> Problem 1.2) >> When get namespace involved, things will get more complicated. >> Like the following case: >> Alice | Bob >> Mount devA on /mnt1 in her ns | Mount devA on /mnt2/ in his ns >> freeze /mnt1 | >> sb_want_write() (waiting) | >> umount /mnt1 (success since there is | >> another mount instance) | >> | umount /mnt2 (fail since there >> | is sb_want_write() waiting) >> >> So Alice can't thaw the fs since there is no mount point for it now. >> >> 2) Don't allow any umount of the sb if there is sb_want_write(). >> More aggressive one, purpose by Miao Xie. >> Can't resolve problem 1.1) but will solve problem 1.2). > This is one of the two methods that I told you, but not the one I rec= ommended. > What I wanted to recommend is that thaw the fs at the beginning of th= e > sb kill process, and in sb_want_write(), we check if the sb is active= or > not after we pass sb_start_write, if the sb is not active, go back. > (This way also is not so good, but better than the above one) > >> Although introduced new problem like the following: >> Alice >> Mount devA on /mnt1 >> freeze /mnt1 >> sb_want_write() (waiting) >> mount devA on /mnt2 and /mnt3 >> >> /mnt[123] all can't be unmounted, but new mount can still be created= =2E >> >> 3) sb_want_write() doesn't make any sense and break VFS rules! >> Action which will change on-disk data should not be tunable through = sysfs, >> and sb_want_write() things which by-pass all the VFS check is just e= vil. >> And for btrfs, we already have the ioctl to set label, why bothering= new >> sysfs interface to do it again? >> >> Although I use method 1) to do it, I am still not certain about whic= h is >> method is the correct one. >> >> So any advise is welcomed. >> >> Thanks, >> Qu > [SNIP] > >> +/** >> + * sb_want_write - get write acess to a super block >> + * @sb: the superblock of the filesystem >> + * >> + * This tells the low-level filesystem that a write is about to be = performed to >> + * it, and makes sure that the writes are allowed (superblock is re= ad-write, >> + * filesystem is not frozen) before returning success. >> + * When the write operation is finished, sb_drop_write() must be ca= lled. >> + * This is much like mnt_want_write() as a refcount, but only needs >> + * the superblock to be read-write. >> + */ >> +int sb_want_write(struct super_block *sb) >> +{ >> + spin_lock(&sb->s_want_write_lock); >> + if (sb->s_want_write_block) { >> + spin_unlock(&sb->s_want_write_lock); >> + return -EBUSY; >> + } >> + sb->s_want_write_count++; >> + spin_unlock(&sb->s_want_write_lock); >> + >> + sb_start_write(sb); >> + if (sb->s_readonly_remount || sb->s_flags & MS_RDONLY) { > If someone remount the fs to R/O here(after the check), we should not= continue > to change label/features. I think we need add some check in remount f= unctions. Oh, you're right. Ro remount should also check s_want_write_count and block incoming=20 s_want_write_count until remount ro is done. Thanks, Qu > Thanks > Miao -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html