linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* (no subject)
@ 2021-08-16  2:46 Kari Argillander
  2021-08-16  2:47 ` [RFC PATCH 1/4] fs/ntfs3: Use new api for mounting Kari Argillander
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Kari Argillander @ 2021-08-16  2:46 UTC (permalink / raw)
  To: Konstantin Komarov, Christoph Hellwig
  Cc: Kari Argillander, ntfs3, linux-kernel, linux-fsdevel,
	Pali Rohár, Matthew Wilcox

Date: Mon, 16 Aug 2021 04:08:46 +0300
Subject: [RFC PATCH 0/4] fs/ntfs3: Use new mount api and change some opts

This series modify ntfs3 to use new mount api as Christoph Hellwig wish
for.
https://lore.kernel.org/linux-fsdevel/20210810090234.GA23732@lst.de/

It also modify mount options noatime (not needed) and make new alias
for nls because kernel is changing to use it as described in here
https://lore.kernel.org/linux-fsdevel/20210808162453.1653-1-pali@kernel.org/

I would like really like to get fsparam_flag_no also for no_acs_rules
but then we have to make new name for it. Other possibility is to
modify mount api so it mount option can be no/no_. I think that would
maybe be good change. 

I did not quite like how I did nls table loading because now it always
first load default table and if user give option then default table is
dropped and if reconfigure is happening and this was same as before then
it is dropped. I try to make loading in fill_super and fs_reconfigure
but that just look ugly. This is quite readible so I leave it like this.
We also do not mount/remount so often that this probebly does not
matter. It seems that if new mount api had possibility to give default
value for mount option then there is not this kind of problem.

I would hope that these will added top of the now going ntfs3 patch
series. I do not have so many contributions to kernel yet and I would
like to get my name going there so that in future it would be easier to
contribute kernel.

Kari Argillander (4):
  fs/ntfs3: Use new api for mounting
  fs/ntfs3: Remove unnecesarry mount option noatime
  fs/ntfs3: Make mount option nohidden more universal
  fs/ntfs3: Add iocharset= mount option as alias for nls=

 Documentation/filesystems/ntfs3.rst |   4 -
 fs/ntfs3/super.c                    | 391 ++++++++++++++--------------
 2 files changed, 196 insertions(+), 199 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread
* (no subject)
@ 2021-08-21  8:59 Kari Argillander
  2021-08-22 13:13 ` your mail CGEL
  0 siblings, 1 reply; 30+ messages in thread
From: Kari Argillander @ 2021-08-21  8:59 UTC (permalink / raw)
  To: cgel.zte
  Cc: viro, christian.brauner, jamorris, gladkov.alexey, yang.yang29,
	tj, paul.gortmaker, linux-fsdevel, linux-kernel, Zeal Robot

Bcc:
Subject: Re: [PATCH] proc: prevent mount proc on same mountpoint in one pid
 namespace
Reply-To:
In-Reply-To: <20210821083105.30336-1-yang.yang29@zte.com.cn>

On Sat, Aug 21, 2021 at 01:31:05AM -0700, cgel.zte@gmail.com wrote:
> From: Yang Yang <yang.yang29@zte.com.cn>
> 
> Patch "proc: allow to mount many instances of proc in one pid namespace"
> aims to mount many instances of proc on different mountpoint, see
> tools/testing/selftests/proc/proc-multiple-procfs.c.
> 
> But there is a side-effects, user can mount many instances of proc on
> the same mountpoint in one pid namespace, which is not allowed before.
> This duplicate mount makes no sense but wastes memory and CPU, and user
> may be confused why kernel allows it.
> 
> The logic of this patch is: when try to mount proc on /mnt, check if
> there is a proc instance mount on /mnt in the same pid namespace. If
> answer is yes, return -EBUSY.
> 
> Since this check can't be done in proc_get_tree(), which call
> get_tree_nodev() and will create new super_block unconditionally.
> And other nodev fs may faces the same case, so add a new hook in
> fs_context_operations.
> 
> Reported-by: Zeal Robot <zealci@zte.com.cn>
> Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
> ---
>  fs/namespace.c             |  9 +++++++++
>  fs/proc/root.c             | 15 +++++++++++++++
>  include/linux/fs_context.h |  1 +
>  3 files changed, 25 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index f79d9471cb76..84da649a70c5 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2878,6 +2878,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
>  static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
>  			int mnt_flags, const char *name, void *data)
>  {
> +	int (*check_mntpoint)(struct fs_context *fc, struct path *path);
>  	struct file_system_type *type;
>  	struct fs_context *fc;
>  	const char *subtype = NULL;
> @@ -2906,6 +2907,13 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
>  	if (IS_ERR(fc))
>  		return PTR_ERR(fc);
>  
> +	/* check if there is a same super_block mount on path*/
> +	check_mntpoint = fc->ops->check_mntpoint;
> +	if (check_mntpoint)
> +		err = check_mntpoint(fc, path);
> +	if (err < 0)
> +		goto err_fc;
> +
>  	if (subtype)
>  		err = vfs_parse_fs_string(fc, "subtype",
>  					  subtype, strlen(subtype));
> @@ -2920,6 +2928,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
>  	if (!err)
>  		err = do_new_mount_fc(fc, path, mnt_flags);
>  
> +err_fc:
>  	put_fs_context(fc);
>  	return err;
>  }
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index c7e3b1350ef8..0971d6b0bec2 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -237,11 +237,26 @@ static void proc_fs_context_free(struct fs_context *fc)
>  	kfree(ctx);
>  }
>  
> +static int proc_check_mntpoint(struct fs_context *fc, struct path *path)
> +{
> +	struct super_block *mnt_sb = path->mnt->mnt_sb;
> +	struct proc_fs_info *fs_info;
> +
> +	if (strcmp(mnt_sb->s_type->name, "proc") == 0) {
> +		fs_info = mnt_sb->s_fs_info;
> +		if (fs_info->pid_ns == task_active_pid_ns(current) &&
> +		    path->mnt->mnt_root == path->dentry)
> +			return -EBUSY;
> +	}
> +	return 0;
> +}
> +
>  static const struct fs_context_operations proc_fs_context_ops = {
>  	.free		= proc_fs_context_free,
>  	.parse_param	= proc_parse_param,
>  	.get_tree	= proc_get_tree,
>  	.reconfigure	= proc_reconfigure,
> +	.check_mntpoint	= proc_check_mntpoint,
>  };
>  
>  static int proc_init_fs_context(struct fs_context *fc)
> diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
> index 6b54982fc5f3..090a05fb2d7d 100644
> --- a/include/linux/fs_context.h
> +++ b/include/linux/fs_context.h
> @@ -119,6 +119,7 @@ struct fs_context_operations {
>  	int (*parse_monolithic)(struct fs_context *fc, void *data);
>  	int (*get_tree)(struct fs_context *fc);
>  	int (*reconfigure)(struct fs_context *fc);
> +	int (*check_mntpoint)(struct fs_context *fc, struct path *path);

Don't you think this should be it's own patch. It is after all internal
api change. This also needs documentation. It would be confusing if
someone convert to new mount api and there is one line which just
address some proc stuff but even commit message does not address does
every fs needs to add this. 

Documentation is very good shape right now and we are in face that
everyone is migrating to use new mount api so everyting should be well
documented.

>  };
>  
>  /*
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock
@ 2011-05-03 11:01 Surbhi Palande
  2011-05-03 13:08 ` (unknown), Surbhi Palande
  0 siblings, 1 reply; 30+ messages in thread
From: Surbhi Palande @ 2011-05-03 11:01 UTC (permalink / raw)
  To: Toshiyuki Okajima
  Cc: Jan Kara, Ted Ts'o, Masayoshi MIZUMA, Andreas Dilger,
	linux-ext4, linux-fsdevel, sandeen

On 04/18/2011 12:05 PM, Toshiyuki Okajima wrote:
> Hi,
>
> (2011/04/16 2:13), Jan Kara wrote:
>> Hello,
>>
>> On Fri 15-04-11 22:39:07, Toshiyuki Okajima wrote:
>>>> For ext3 or ext4 without delayed allocation we block inside writepage()
>>>> function. But as I wrote to Dave Chinner, ->page_mkwrite() should
>>>> probably
>>>> get modified to block while minor-faulting the page on frozen fs
>>>> because
>>>> when blocks are already allocated we may skip starting a transaction
>>>> and so
>>>> we could possibly modify the filesystem.
>>> OK. I think ->page_mkwrite() should also block writing the
>>> minor-faulting pages.
>>>
>>> (minor-pagefault)
>>> -> do_wp_page()
>>> -> page_mkwrite(= ext4_mkwrite())
>>> => BLOCK!
>>>
>>> (major-pagefault)
>>> -> do_liner_fault()
>>> -> page_mkwrite(= ext4_mkwrite())
>>> => BLOCK!
>>>
>>>>
>>>>>>> Mizuma-san's reproducer also writes the data which maps to the
>>>>>>> file (mmap).
>>>>>>> The original problem happens after the fsfreeze operation is done.
>>>>>>> I understand the normal write operation (not mmap) can be blocked
>>>>>>> while
>>>>>>> fsfreezing. So, I guess we don't always block all the write
>>>>>>> operation
>>>>>>> while fsfreezing.
>>>>>> Technically speaking, we block all the transaction starts which
>>>>>> means we
>>>>>> end up blocking all the writes from going to disk. But that does
>>>>>> not mean
>>>>>> we block all the writes from going to in-memory cache - as you
>>>>>> properly
>>>>>> note the mmap case is one of such exceptions.
>>>>> Hm, I also think we can allow the writes to in-memory cache but we
>>>>> can't allow
>>>>> the writes to disk while fsfreezing. I am considering that mmap
>>>>> path can
>>>>> write to disk while fsfreezing because this deadlock problem
>>>>> happens after
>>>>> fsfreeze operation is done...
>>>> I'm sorry I don't understand now - are you speaking about the case
>>>> above
>>>> when writepage() does not wait for filesystem being frozen or something
>>>> else?
>>> Sorry, I didn't understand around the page fault path.
>>> So, I had read the kernel source code around it, then I maybe
>>> understand...
>>>
>>> I worry whether we can update the file data in mmap case while
>>> fsfreezing.
>>> Of course, I understand that we can write to in-memory cache, and it
>>> is not a
>>> problem. However, if we can write to disk while fsfreezing, it is a
>>> problem.
>>> So, I summarize the cases whether we can write to disk or not.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> Cases (Whether we can write the data mmapped to the file on the disk
>>> while fsfreezing)
>>>
>>> [1] One of the page which has been mmapped is not bound. And
>>> the page is not allocated yet. (major fault?)
>>>
>>> (1) user dirtys a page
>>> (2) a page fault occurs (do_page_fault)
>>> (3) __do_falut is called.
>>> (4) ext4_page_mkwrite is called
>>> (5) ext4_write_begin is called
>>> (6) ext4_journal_start_sb => We can STOP!
>>>
>>> [2] One of the page which has been mmapped is not bound. But
>>> the page is already allocated, and the buffer_heads of the page
>>> are not mapped (BH_Mapped). (minor fault?)
>>>
>>> (1) user dirtys a page
>>> (2) a page fault occurs (do_page_fault)
>>> (3) do_wp_page is called.
>>> (4) ext4_page_mkwrite is called
>>> (5) ext4_write_begin is called
>>> (6) ext4_journal_start_sb => We can STOP!

What happens in the case as follows:

Task 1: Mmapped writes
t1)ext4_page_mkwrite()
   t2) ext4_write_begin() (FS is thawed so we proceed)
   t3) ext4_write_end() (journal is stopped now)
-----Pre-empted-----


Task 2: Freeze Task
t4) freezes the super block...
...(continues)....
tn) the page cache is clean and the F.S is frozen. Freeze has completed 
execution.

Task 1: Mmapped writes
tn+1) ext4_page_mkwrite() returns 0.
tn+2) __do_fault() gets control, code gets executed.
tn+3) _do_fault() marks the page dirty if the intent is to write to a 
file based page which faulted.

So you end up dirtying the page cache when the F.S is frozen? No?


Warm Regards,
Surbhi.







>>>
>>> [3] One of the page which has been mmapped is not bound. But
>>> the page is already allocated, and the buffer_heads of the page
>>> are mapped (BH_Mapped). (minor fault?)
>>>
>>> (1) user dirtys a page
>>> (2) a page fault occurs (do_page_fault)
>>> (3) do_wp_page is called.
>>> (4) ext4_page_mkwrite is called
>>> * Cannot block the dirty page to be written because all bh is mapped.
>>> (5) user munmaps the page (munmap)
>>> (6) zap_pte_range dirtys the page (struct page) which is pte_dirtyed.
>>> (7) writeback thread writes the page (struct page) to disk
>>> => We cannot STOP!
>>>
>>> [4] One of the page which has been mmapped is bound. And
>>> the page is already allocated.
>>>
>>> (1) user dirtys a page
>>> ( ) no page fault occurs
>>> (2) user munmaps the page (munmap)
>>> (3) zap_pte_range dirtys the page (struct page) which is pte_dirtyed.
>>> (4) writeback thread writes the page (struct page) to disk
>>> => We cannot STOP!
>>> --------------------------------------------------------------------------
>>>
>>>
>>> So, we can block the cases [1], [2].
>>> But I think we cannot block the cases [3], [4] now.
>>> If fixing the page_mkwrite, we can also block the case [3].
>>> But the case [4] is not blocked because no page fault occurs
>>> when we dirty the mmapped page.
>>>
>>> Therefore, to repair this problem, we need to fix the cases [3], [4].
>>> I think we must modify the writeback thread to fix the case [4].
>> The trick here is that when we write a page to disk, we write-protect
>> the page (you seem to call this that "the page is bound", I'm not sure
>> why).
> Hm, I want to understand how to write-protect the page under fsfreezing.
> But, anyway, I understand we don't need to consider the case [4].
>
>> So we are guaranteed to receive a minor fault (case [3]) if user tries to
>> modify a page after we finish writeback while freezing the filesystem.
>> So principially all we need to do is just wait in ext4_page_mkwrite().
> OK. I understand.
> Are there any concrete ideas to fix this?
> For ext4, we can rescue from the case [3] by modifying ext4_page_mkwrite().
> But for ext3 or other FSs, we must implement ->page_mkwrite() to prevent
> it?
>
> Thanks,
> Toshiyuki Okajima
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 30+ messages in thread
* (unknown), 
@ 2010-06-16 16:33 Jan Kara
  2010-06-16 22:15 ` your mail Dave Chinner
  2010-06-22  2:59 ` Wu Fengguang
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2010-06-16 16:33 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, Andrew Morton, npiggin

  Hello,

  here is the fourth version of the writeback livelock avoidance patches
for data integrity writes. To quickly summarize the idea: we tag dirty
pages at the beginning of write_cache_pages with a new TOWRITE tag and
then write only tagged pages to avoid parallel writers to livelock us.
See changelogs of the patches for more details.
  I have tested the patches with fsx and a test program I wrote which
checks that if we crash after fsync, the data is indeed on disk.
  If there are no more concerns, can these patches get merged?

								Honza

  Changes since last version:
- tagging function was changed to stop after given amount of pages to
  avoid keeping tree_lock and irqs disabled for too long
- changed names and updated comments as Andrew suggested
- measured memory impact and reported it in the changelog

  Things suggested but not changed (I want to avoid going in circles ;):
- use tagging also for WB_SYNC_NONE writeback - there's problem with an
  interaction with wbc->nr_to_write. If we tag all dirty pages, we can
  spend too much time tagging when we write only a few pages in the end
  because of nr_to_write. If we tag only say nr_to_write pages, we may
  not have enough pages tagged because some pages are written out by
  someone else and so we would have to restart and tagging would become
  essentially useless. So my option is - switch to tagging for WB_SYNC_NONE
  writeback if we can get rid of nr_to_write. But that's a story for
  a different patch set.
- implement function for clearing several tags (TOWRITE, DIRTY) at once
  - IMHO not worth it because we would save only conversion of page index
  to radix tree offsets. The rest would have to be separate anyways. And
  the interface would be incosistent as well...
- use __lookup_tag to implement radix_tree_range_tag_if_tagged - doesn't
  quite work because __lookup_tag returns only leaf nodes so we'd have to
  implement tree traversal anyways to tag also internal nodes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-08-22 13:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-16  2:46 Kari Argillander
2021-08-16  2:47 ` [RFC PATCH 1/4] fs/ntfs3: Use new api for mounting Kari Argillander
2021-08-16  3:23   ` Kari Argillander
2021-08-16 12:17     ` Christoph Hellwig
2021-08-16 13:19       ` Kari Argillander
2021-08-16 12:36   ` Christoph Hellwig
2021-08-16 13:14     ` Kari Argillander
2021-08-16 13:24       ` Pali Rohár
2021-08-16 13:40   ` Christian Brauner
2021-08-16 13:59     ` Kari Argillander
2021-08-16 14:21       ` Christian Brauner
2021-08-16  2:47 ` [RFC PATCH 2/4] fs/ntfs3: Remove unnecesarry mount option noatime Kari Argillander
2021-08-16 12:18   ` Christoph Hellwig
2021-08-16 12:45     ` Kari Argillander
2021-08-16  2:47 ` [RFC PATCH 3/4] fs/ntfs3: Make mount option nohidden more universal Kari Argillander
2021-08-16  2:47 ` [RFC PATCH 4/4] fs/ntfs3: Add iocharset= mount option as alias for nls= Kari Argillander
2021-08-16  3:03 ` [RFC PATCH 0/4] fs/ntfs3: Use new mount api and change some opts Kari Argillander
2021-08-16 12:27 ` your mail Christoph Hellwig
2021-08-16 12:48   ` [RFC PATCH 0/4] fs/ntfs3: Use new mount api and change some opts Kari Argillander
  -- strict thread matches above, loose matches on Subject: below --
2021-08-21  8:59 Kari Argillander
2021-08-22 13:13 ` your mail CGEL
2011-05-03 11:01 [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Surbhi Palande
2011-05-03 13:08 ` (unknown), Surbhi Palande
2011-05-03 13:46   ` your mail Jan Kara
2011-05-03 13:56     ` Surbhi Palande
2011-05-03 15:26       ` Surbhi Palande
2011-05-03 15:36       ` Jan Kara
2011-05-03 15:43         ` Surbhi Palande
2011-05-04 19:24           ` Jan Kara
2010-06-16 16:33 (unknown), Jan Kara
2010-06-16 22:15 ` your mail Dave Chinner
2010-06-22  2:59 ` Wu Fengguang
2010-06-22 13:54   ` Jan Kara
2010-06-22 14:12     ` Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).