* [PATCH v10 1/3] xfs: fix the calculation of length and end
2023-02-17 14:48 [PATCH v10 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
@ 2023-02-17 14:48 ` Shiyang Ruan
2023-02-17 14:48 ` [PATCH v10 2/3] fs: introduce super_drop_pagecache() Shiyang Ruan
2023-02-17 14:48 ` [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
2 siblings, 0 replies; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-17 14:48 UTC (permalink / raw)
To: linux-xfs, nvdimm, linux-fsdevel, linux-mm
Cc: djwong, david, dan.j.williams, hch, jane.chu, akpm, willy,
ruansy.fnst
The end should be start + length - 1. Also fix the calculation of the
length when seeking for intersection of notify range and device.
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_notify_failure.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index c4078d0ec108..7d46a7e4980f 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -114,7 +114,7 @@ xfs_dax_notify_ddev_failure(
int error = 0;
xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, daddr);
xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno);
- xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen);
+ xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen - 1);
xfs_agnumber_t end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno);
error = xfs_trans_alloc_empty(mp, &tp);
@@ -210,7 +210,7 @@ xfs_dax_notify_failure(
ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
/* Ignore the range out of filesystem area */
- if (offset + len < ddev_start)
+ if (offset + len - 1 < ddev_start)
return -ENXIO;
if (offset > ddev_end)
return -ENXIO;
@@ -222,8 +222,8 @@ xfs_dax_notify_failure(
len -= ddev_start - offset;
offset = 0;
}
- if (offset + len > ddev_end)
- len -= ddev_end - offset;
+ if (offset + len - 1 > ddev_end)
+ len = ddev_end - offset + 1;
return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len),
mf_flags);
--
2.39.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-17 14:48 [PATCH v10 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
2023-02-17 14:48 ` [PATCH v10 1/3] xfs: fix the calculation of length and end Shiyang Ruan
@ 2023-02-17 14:48 ` Shiyang Ruan
2023-02-17 16:14 ` Matthew Wilcox
2023-02-20 21:25 ` Dave Chinner
2023-02-17 14:48 ` [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
2 siblings, 2 replies; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-17 14:48 UTC (permalink / raw)
To: linux-xfs, nvdimm, linux-fsdevel, linux-mm
Cc: djwong, david, dan.j.williams, hch, jane.chu, akpm, willy,
ruansy.fnst
xfs_notify_failure.c requires a method to invalidate all dax mappings.
drop_pagecache_sb() can do this but it is a static function and only
build with CONFIG_SYSCTL. Now, move its implementation into super.c and
call it super_drop_pagecache(). Use its second argument as invalidator
so that we can choose which invalidate method to use.
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
fs/drop_caches.c | 29 +--------------------------
fs/super.c | 43 +++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 2 ++
include/linux/pagemap.h | 1 +
mm/truncate.c | 20 +++++++++++++++++--
5 files changed, 65 insertions(+), 30 deletions(-)
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index e619c31b6bd9..f88ce339b635 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -17,34 +17,7 @@ int sysctl_drop_caches;
static void drop_pagecache_sb(struct super_block *sb, void *unused)
{
- struct inode *inode, *toput_inode = NULL;
-
- spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
- spin_lock(&inode->i_lock);
- /*
- * We must skip inodes in unusual state. We may also skip
- * inodes without pages but we deliberately won't in case
- * we need to reschedule to avoid softlockups.
- */
- if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
- (mapping_empty(inode->i_mapping) && !need_resched())) {
- spin_unlock(&inode->i_lock);
- continue;
- }
- __iget(inode);
- spin_unlock(&inode->i_lock);
- spin_unlock(&sb->s_inode_list_lock);
-
- invalidate_mapping_pages(inode->i_mapping, 0, -1);
- iput(toput_inode);
- toput_inode = inode;
-
- cond_resched();
- spin_lock(&sb->s_inode_list_lock);
- }
- spin_unlock(&sb->s_inode_list_lock);
- iput(toput_inode);
+ super_drop_pagecache(sb, invalidate_inode_pages);
}
int drop_caches_sysctl_handler(struct ctl_table *table, int write,
diff --git a/fs/super.c b/fs/super.c
index 12c08cb20405..a403243b5513 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -36,6 +36,7 @@
#include <linux/lockdep.h>
#include <linux/user_namespace.h>
#include <linux/fs_context.h>
+#include <linux/pagemap.h>
#include <uapi/linux/mount.h>
#include "internal.h"
@@ -678,6 +679,48 @@ void drop_super_exclusive(struct super_block *sb)
}
EXPORT_SYMBOL(drop_super_exclusive);
+/**
+ * super_drop_pagecache - drop all page caches of a filesystem
+ * @sb: superblock to invalidate
+ * @arg: invalidate method, such as invalidate_inode_pages(),
+ * invalidate_inode_pages2()
+ *
+ * Scans the inodes of a filesystem, drop all page caches.
+ */
+void super_drop_pagecache(struct super_block *sb,
+ int (*invalidator)(struct address_space *))
+{
+ struct inode *inode, *toput_inode = NULL;
+
+ spin_lock(&sb->s_inode_list_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ spin_lock(&inode->i_lock);
+ /*
+ * We must skip inodes in unusual state. We may also skip
+ * inodes without pages but we deliberately won't in case
+ * we need to reschedule to avoid softlockups.
+ */
+ if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+ (mapping_empty(inode->i_mapping) && !need_resched())) {
+ spin_unlock(&inode->i_lock);
+ continue;
+ }
+ __iget(inode);
+ spin_unlock(&inode->i_lock);
+ spin_unlock(&sb->s_inode_list_lock);
+
+ invalidator(inode->i_mapping);
+ iput(toput_inode);
+ toput_inode = inode;
+
+ cond_resched();
+ spin_lock(&sb->s_inode_list_lock);
+ }
+ spin_unlock(&sb->s_inode_list_lock);
+ iput(toput_inode);
+}
+EXPORT_SYMBOL(super_drop_pagecache);
+
static void __iterate_supers(void (*f)(struct super_block *))
{
struct super_block *sb, *p = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c1769a2c5d70..fdcaa9bf85dd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3308,6 +3308,8 @@ extern struct super_block *get_super(struct block_device *);
extern struct super_block *get_active_super(struct block_device *bdev);
extern void drop_super(struct super_block *sb);
extern void drop_super_exclusive(struct super_block *sb);
+void super_drop_pagecache(struct super_block *sb,
+ int (*invalidator)(struct address_space *));
extern void iterate_supers(void (*)(struct super_block *, void *), void *);
extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 29e1f9e76eb6..d0a180268baa 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -27,6 +27,7 @@ static inline void invalidate_remote_inode(struct inode *inode)
S_ISLNK(inode->i_mode))
invalidate_mapping_pages(inode->i_mapping, 0, -1);
}
+int invalidate_inode_pages(struct address_space *mapping);
int invalidate_inode_pages2(struct address_space *mapping);
int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
diff --git a/mm/truncate.c b/mm/truncate.c
index 7b4ea4c4a46b..131f2ab2d566 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -540,12 +540,13 @@ unsigned long invalidate_mapping_pagevec(struct address_space *mapping,
}
/**
- * invalidate_mapping_pages - Invalidate all clean, unlocked cache of one inode
+ * invalidate_mapping_pages - Invalidate range of clean, unlocked cache of one
+ * inode
* @mapping: the address_space which holds the cache to invalidate
* @start: the offset 'from' which to invalidate
* @end: the offset 'to' which to invalidate (inclusive)
*
- * This function removes pages that are clean, unmapped and unlocked,
+ * This function removes range of pages that are clean, unmapped and unlocked,
* as well as shadow entries. It will not block on IO activity.
*
* If you want to remove all the pages of one inode, regardless of
@@ -560,6 +561,21 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
}
EXPORT_SYMBOL(invalidate_mapping_pages);
+/**
+ * invalidate_inode_pages - Invalidate all clean, unlocked cache of one inode
+ * @mapping: the address_space which holds the cache to invalidate
+ *
+ * This function removes all pages that are clean, unmapped and unlocked,
+ * as well as shadow entries. It will not block on IO activity.
+ */
+int invalidate_inode_pages(struct address_space *mapping)
+{
+ invalidate_mapping_pages(mapping, 0, -1);
+
+ return 0;
+}
+EXPORT_SYMBOL(invalidate_inode_pages);
+
/*
* This is like invalidate_inode_page(), except it ignores the page's
* refcount. We do this because invalidate_inode_pages2() needs stronger
--
2.39.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-17 14:48 ` [PATCH v10 2/3] fs: introduce super_drop_pagecache() Shiyang Ruan
@ 2023-02-17 16:14 ` Matthew Wilcox
2023-02-18 1:16 ` Shiyang Ruan
2023-02-20 21:25 ` Dave Chinner
1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2023-02-17 16:14 UTC (permalink / raw)
To: Shiyang Ruan
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong, david,
dan.j.williams, hch, jane.chu, akpm
On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
> - invalidate_mapping_pages(inode->i_mapping, 0, -1);
> - iput(toput_inode);
> - toput_inode = inode;
> -
> - cond_resched();
> - spin_lock(&sb->s_inode_list_lock);
> - }
> - spin_unlock(&sb->s_inode_list_lock);
> - iput(toput_inode);
> + super_drop_pagecache(sb, invalidate_inode_pages);
I thought I explained last time that you can do this with
invalidate_mapping_pages() / invalidate_inode_pages2_range() ?
Then you don't need to introduce invalidate_inode_pages().
> +void super_drop_pagecache(struct super_block *sb,
> + int (*invalidator)(struct address_space *))
void super_drop_pagecache(struct super_block *sb,
int (*invalidate)(struct address_space *, pgoff_t, pgoff_t))
> + invalidator(inode->i_mapping);
invalidate(inode->i_mapping, 0, -1)
... then all the changes to mm/truncate.c and filemap.h go away.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-17 16:14 ` Matthew Wilcox
@ 2023-02-18 1:16 ` Shiyang Ruan
2023-02-18 18:27 ` Matthew Wilcox
0 siblings, 1 reply; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-18 1:16 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong, david,
dan.j.williams, hch, jane.chu, akpm
在 2023/2/18 0:14, Matthew Wilcox 写道:
> On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
>> - invalidate_mapping_pages(inode->i_mapping, 0, -1);
>> - iput(toput_inode);
>> - toput_inode = inode;
>> -
>> - cond_resched();
>> - spin_lock(&sb->s_inode_list_lock);
>> - }
>> - spin_unlock(&sb->s_inode_list_lock);
>> - iput(toput_inode);
>> + super_drop_pagecache(sb, invalidate_inode_pages);
>
> I thought I explained last time that you can do this with
> invalidate_mapping_pages() / invalidate_inode_pages2_range() ?
> Then you don't need to introduce invalidate_inode_pages().
>
>> +void super_drop_pagecache(struct super_block *sb,
>> + int (*invalidator)(struct address_space *))
>
> void super_drop_pagecache(struct super_block *sb,
> int (*invalidate)(struct address_space *, pgoff_t, pgoff_t))
>
>> + invalidator(inode->i_mapping);
>
> invalidate(inode->i_mapping, 0, -1)
>
> ... then all the changes to mm/truncate.c and filemap.h go away.
Yes, I tried as you suggested, but I found that they don't have same
type of return value.
int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
unsigned long invalidate_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t end);
--
Thanks,
Ruan.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-18 1:16 ` Shiyang Ruan
@ 2023-02-18 18:27 ` Matthew Wilcox
2023-02-20 9:39 ` Shiyang Ruan
0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2023-02-18 18:27 UTC (permalink / raw)
To: Shiyang Ruan
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong, david,
dan.j.williams, hch, jane.chu, akpm
On Sat, Feb 18, 2023 at 09:16:43AM +0800, Shiyang Ruan wrote:
> 在 2023/2/18 0:14, Matthew Wilcox 写道:
> > On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
> > > - invalidate_mapping_pages(inode->i_mapping, 0, -1);
> > > - iput(toput_inode);
> > > - toput_inode = inode;
> > > -
> > > - cond_resched();
> > > - spin_lock(&sb->s_inode_list_lock);
> > > - }
> > > - spin_unlock(&sb->s_inode_list_lock);
> > > - iput(toput_inode);
> > > + super_drop_pagecache(sb, invalidate_inode_pages);
> >
> > I thought I explained last time that you can do this with
> > invalidate_mapping_pages() / invalidate_inode_pages2_range() ?
> > Then you don't need to introduce invalidate_inode_pages().
> >
> > > +void super_drop_pagecache(struct super_block *sb,
> > > + int (*invalidator)(struct address_space *))
> >
> > void super_drop_pagecache(struct super_block *sb,
> > int (*invalidate)(struct address_space *, pgoff_t, pgoff_t))
> >
> > > + invalidator(inode->i_mapping);
> >
> > invalidate(inode->i_mapping, 0, -1)
> >
> > ... then all the changes to mm/truncate.c and filemap.h go away.
>
> Yes, I tried as you suggested, but I found that they don't have same type of
> return value.
>
> int invalidate_inode_pages2_range(struct address_space *mapping,
> pgoff_t start, pgoff_t end);
>
> unsigned long invalidate_mapping_pages(struct address_space *mapping,
> pgoff_t start, pgoff_t end);
Oh, that's annoying. Particularly annoying is that the return value
for invalidate_mapping_pages() is used by fs/inode.c to account for
the number of pages invalidate, and the return value for
invalidate_inode_pages2_range() is used by several filesystems
to know whether an error occurred.
Hm. Shouldn't you be checking for an error from
invalidate_inode_pages2_range()? Seems like it can return -EBUSY for
DAX entries.
With that in mind, the wrapper you actually want to exist is
static int invalidate_inode_pages_range(struct address_space *mapping,
pgoff_t start, pgoff_t end)
{
invalidate_mapping_pages(mapping, start, end);
return 0;
}
Right?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-18 18:27 ` Matthew Wilcox
@ 2023-02-20 9:39 ` Shiyang Ruan
2023-02-20 9:45 ` Shiyang Ruan
0 siblings, 1 reply; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-20 9:39 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong, david,
dan.j.williams, hch, jane.chu, akpm
在 2023/2/19 2:27, Matthew Wilcox 写道:
> On Sat, Feb 18, 2023 at 09:16:43AM +0800, Shiyang Ruan wrote:
>> 在 2023/2/18 0:14, Matthew Wilcox 写道:
>>> On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
>>>> - invalidate_mapping_pages(inode->i_mapping, 0, -1);
>>>> - iput(toput_inode);
>>>> - toput_inode = inode;
>>>> -
>>>> - cond_resched();
>>>> - spin_lock(&sb->s_inode_list_lock);
>>>> - }
>>>> - spin_unlock(&sb->s_inode_list_lock);
>>>> - iput(toput_inode);
>>>> + super_drop_pagecache(sb, invalidate_inode_pages);
>>>
>>> I thought I explained last time that you can do this with
>>> invalidate_mapping_pages() / invalidate_inode_pages2_range() ?
>>> Then you don't need to introduce invalidate_inode_pages().
>>>
>>>> +void super_drop_pagecache(struct super_block *sb,
>>>> + int (*invalidator)(struct address_space *))
>>>
>>> void super_drop_pagecache(struct super_block *sb,
>>> int (*invalidate)(struct address_space *, pgoff_t, pgoff_t))
>>>
>>>> + invalidator(inode->i_mapping);
>>>
>>> invalidate(inode->i_mapping, 0, -1)
>>>
>>> ... then all the changes to mm/truncate.c and filemap.h go away.
>>
>> Yes, I tried as you suggested, but I found that they don't have same type of
>> return value.
>>
>> int invalidate_inode_pages2_range(struct address_space *mapping,
>> pgoff_t start, pgoff_t end);
>>
>> unsigned long invalidate_mapping_pages(struct address_space *mapping,
>> pgoff_t start, pgoff_t end);
>
> Oh, that's annoying. Particularly annoying is that the return value
> for invalidate_mapping_pages() is used by fs/inode.c to account for
> the number of pages invalidate, and the return value for
> invalidate_inode_pages2_range() is used by several filesystems
> to know whether an error occurred.
>
> Hm. Shouldn't you be checking for an error from
> invalidate_inode_pages2_range()? Seems like it can return -EBUSY for
> DAX entries.
>
> With that in mind, the wrapper you actually want to exist is
>
> static int invalidate_inode_pages_range(struct address_space *mapping,
> pgoff_t start, pgoff_t end)
> {
> invalidate_mapping_pages(mapping, start, end);
> return 0;
> }
>
> Right?
So, I should introduce this wrapper in fs/xfs/xfs_notify_failure.c
because it is the only one who calls this wrapper. Ok, got it!
--
Thanks,
Ruan.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-20 9:39 ` Shiyang Ruan
@ 2023-02-20 9:45 ` Shiyang Ruan
0 siblings, 0 replies; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-20 9:45 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong, david,
dan.j.williams, hch, jane.chu, akpm
在 2023/2/20 17:39, Shiyang Ruan 写道:
>
>
> 在 2023/2/19 2:27, Matthew Wilcox 写道:
>> On Sat, Feb 18, 2023 at 09:16:43AM +0800, Shiyang Ruan wrote:
>>> 在 2023/2/18 0:14, Matthew Wilcox 写道:
>>>> On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
>>>>> - invalidate_mapping_pages(inode->i_mapping, 0, -1);
>>>>> - iput(toput_inode);
>>>>> - toput_inode = inode;
>>>>> -
>>>>> - cond_resched();
>>>>> - spin_lock(&sb->s_inode_list_lock);
>>>>> - }
>>>>> - spin_unlock(&sb->s_inode_list_lock);
>>>>> - iput(toput_inode);
>>>>> + super_drop_pagecache(sb, invalidate_inode_pages);
>>>>
>>>> I thought I explained last time that you can do this with
>>>> invalidate_mapping_pages() / invalidate_inode_pages2_range() ?
>>>> Then you don't need to introduce invalidate_inode_pages().
>>>>
>>>>> +void super_drop_pagecache(struct super_block *sb,
>>>>> + int (*invalidator)(struct address_space *))
>>>>
>>>> void super_drop_pagecache(struct super_block *sb,
>>>> int (*invalidate)(struct address_space *, pgoff_t, pgoff_t))
>>>>
>>>>> + invalidator(inode->i_mapping);
>>>>
>>>> invalidate(inode->i_mapping, 0, -1)
>>>>
>>>> ... then all the changes to mm/truncate.c and filemap.h go away.
>>>
>>> Yes, I tried as you suggested, but I found that they don't have same
>>> type of
>>> return value.
>>>
>>> int invalidate_inode_pages2_range(struct address_space *mapping,
>>> pgoff_t start, pgoff_t end);
>>>
>>> unsigned long invalidate_mapping_pages(struct address_space *mapping,
>>> pgoff_t start, pgoff_t end);
>>
>> Oh, that's annoying. Particularly annoying is that the return value
>> for invalidate_mapping_pages() is used by fs/inode.c to account for
>> the number of pages invalidate, and the return value for
>> invalidate_inode_pages2_range() is used by several filesystems
>> to know whether an error occurred.
>>
>> Hm. Shouldn't you be checking for an error from
>> invalidate_inode_pages2_range()? Seems like it can return -EBUSY for
>> DAX entries.
>>
>> With that in mind, the wrapper you actually want to exist is
>>
>> static int invalidate_inode_pages_range(struct address_space *mapping,
>> pgoff_t start, pgoff_t end)
>> {
>> invalidate_mapping_pages(mapping, start, end);
>> return 0;
>> }
>>
>> Right?
>
> So, I should introduce this wrapper in fs/xfs/xfs_notify_failure.c
Ahh... It's not fs/xfs/xfs_notify_failure.c, should be fs/drop_caches.c.
> because it is the only one who calls this wrapper. Ok, got it!
>
>
> --
> Thanks,
> Ruan.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-17 14:48 ` [PATCH v10 2/3] fs: introduce super_drop_pagecache() Shiyang Ruan
2023-02-17 16:14 ` Matthew Wilcox
@ 2023-02-20 21:25 ` Dave Chinner
2023-02-21 1:57 ` Shiyang Ruan
1 sibling, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2023-02-20 21:25 UTC (permalink / raw)
To: Shiyang Ruan
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong,
dan.j.williams, hch, jane.chu, akpm, willy
On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
> xfs_notify_failure.c requires a method to invalidate all dax mappings.
> drop_pagecache_sb() can do this but it is a static function and only
> build with CONFIG_SYSCTL. Now, move its implementation into super.c and
> call it super_drop_pagecache(). Use its second argument as invalidator
> so that we can choose which invalidate method to use.
>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
I got no repsonse last time, so I'll just post a link to the
concerns I stated about this:
https://lore.kernel.org/linux-xfs/20230205215000.GT360264@dread.disaster.area/
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-20 21:25 ` Dave Chinner
@ 2023-02-21 1:57 ` Shiyang Ruan
2023-02-26 23:50 ` Dave Chinner
0 siblings, 1 reply; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-21 1:57 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong,
dan.j.williams, hch, jane.chu, akpm, willy
在 2023/2/21 5:25, Dave Chinner 写道:
> On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
>> xfs_notify_failure.c requires a method to invalidate all dax mappings.
>> drop_pagecache_sb() can do this but it is a static function and only
>> build with CONFIG_SYSCTL. Now, move its implementation into super.c and
>> call it super_drop_pagecache(). Use its second argument as invalidator
>> so that we can choose which invalidate method to use.
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>
> I got no repsonse last time, so I'll just post a link to the
> concerns I stated about this:
>
> https://lore.kernel.org/linux-xfs/20230205215000.GT360264@dread.disaster.area/
Please see patch 3. I changed the code: freeze the fs, then drop its
caches.
--
Thanks,
Ruan.
>
> -Dave.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] fs: introduce super_drop_pagecache()
2023-02-21 1:57 ` Shiyang Ruan
@ 2023-02-26 23:50 ` Dave Chinner
0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2023-02-26 23:50 UTC (permalink / raw)
To: Shiyang Ruan
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong,
dan.j.williams, hch, jane.chu, akpm, willy
On Tue, Feb 21, 2023 at 09:57:52AM +0800, Shiyang Ruan wrote:
>
>
> 在 2023/2/21 5:25, Dave Chinner 写道:
> > On Fri, Feb 17, 2023 at 02:48:31PM +0000, Shiyang Ruan wrote:
> > > xfs_notify_failure.c requires a method to invalidate all dax mappings.
> > > drop_pagecache_sb() can do this but it is a static function and only
> > > build with CONFIG_SYSCTL. Now, move its implementation into super.c and
> > > call it super_drop_pagecache(). Use its second argument as invalidator
> > > so that we can choose which invalidate method to use.
> > >
> > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> >
> > I got no repsonse last time, so I'll just post a link to the
> > concerns I stated about this:
> >
> > https://lore.kernel.org/linux-xfs/20230205215000.GT360264@dread.disaster.area/
>
> Please see patch 3. I changed the code: freeze the fs, then drop its
> caches.
Neither the patch nor the cover letter have a changelog in them
that mention you changed anything. Please, at least, keep a change
log in the cover letter so that people know what has changed from
version to version without having to look at every patch and line of
code again.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
2023-02-17 14:48 [PATCH v10 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
2023-02-17 14:48 ` [PATCH v10 1/3] xfs: fix the calculation of length and end Shiyang Ruan
2023-02-17 14:48 ` [PATCH v10 2/3] fs: introduce super_drop_pagecache() Shiyang Ruan
@ 2023-02-17 14:48 ` Shiyang Ruan
2023-02-27 0:07 ` Dave Chinner
2 siblings, 1 reply; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-17 14:48 UTC (permalink / raw)
To: linux-xfs, nvdimm, linux-fsdevel, linux-mm
Cc: djwong, david, dan.j.williams, hch, jane.chu, akpm, willy,
ruansy.fnst
This patch is inspired by Dan's "mm, dax, pmem: Introduce
dev_pagemap_failure()"[1]. With the help of dax_holder and
->notify_failure() mechanism, the pmem driver is able to ask filesystem
(or mapped device) on it to unmap all files in use and notify processes
who are using those files.
Call trace:
trigger unbind
-> unbind_store()
-> ... (skip)
-> devres_release_all() # was pmem driver ->remove() in v1
-> kill_dax()
-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
-> xfs_dax_notify_failure()
Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
event. So do not shutdown filesystem directly if something not
supported, or if failure range includes metadata area. Make sure all
files and processes are handled correctly.
[1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
drivers/dax/super.c | 3 ++-
fs/xfs/xfs_notify_failure.c | 26 ++++++++++++++++++++++++++
include/linux/mm.h | 1 +
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index c4c4728a36e4..2e1a35e82fce 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
return;
if (dax_dev->holder_data != NULL)
- dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
+ dax_holder_notify_failure(dax_dev, 0, U64_MAX,
+ MF_MEM_PRE_REMOVE);
clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
synchronize_srcu(&dax_srcu);
diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index 7d46a7e4980f..5f915cfc9632 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/dax.h>
+#include <linux/fs.h>
struct xfs_failure_info {
xfs_agblock_t startblock;
@@ -77,6 +78,9 @@ xfs_dax_failure_fn(
if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
(rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
+ /* The device is about to be removed. Not a really failure. */
+ if (notify->mf_flags & MF_MEM_PRE_REMOVE)
+ return 0;
notify->want_shutdown = true;
return 0;
}
@@ -168,7 +172,11 @@ xfs_dax_notify_ddev_failure(
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
if (!error)
error = -EFSCORRUPTED;
+ } else if (mf_flags & MF_MEM_PRE_REMOVE) {
+ error = thaw_super(mp->m_super);
+ xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
}
+
return error;
}
@@ -182,6 +190,7 @@ xfs_dax_notify_failure(
struct xfs_mount *mp = dax_holder(dax_dev);
u64 ddev_start;
u64 ddev_end;
+ int error;
if (!(mp->m_super->s_flags & SB_BORN)) {
xfs_warn(mp, "filesystem is not ready for notify_failure()!");
@@ -196,6 +205,8 @@ xfs_dax_notify_failure(
if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
mp->m_logdev_targp != mp->m_ddev_targp) {
+ if (mf_flags & MF_MEM_PRE_REMOVE)
+ return 0;
xfs_err(mp, "ondisk log corrupt, shutting down fs!");
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
return -EFSCORRUPTED;
@@ -209,6 +220,12 @@ xfs_dax_notify_failure(
ddev_start = mp->m_ddev_targp->bt_dax_part_off;
ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
+ /* Notify failure on the whole device */
+ if (offset == 0 && len == U64_MAX) {
+ offset = ddev_start;
+ len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev);
+ }
+
/* Ignore the range out of filesystem area */
if (offset + len - 1 < ddev_start)
return -ENXIO;
@@ -225,6 +242,15 @@ xfs_dax_notify_failure(
if (offset + len - 1 > ddev_end)
len = ddev_end - offset + 1;
+ if (mf_flags & MF_MEM_PRE_REMOVE) {
+ xfs_info(mp, "device is about to be removed!");
+ error = freeze_super(mp->m_super);
+ if (error)
+ return error;
+ /* invalidate_inode_pages2() invalidates dax mapping */
+ super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
+ }
+
return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len),
mf_flags);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8f857163ac89..9711dbc9451f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3424,6 +3424,7 @@ enum mf_flags {
MF_UNPOISON = 1 << 4,
MF_SW_SIMULATED = 1 << 5,
MF_NO_RETRY = 1 << 6,
+ MF_MEM_PRE_REMOVE = 1 << 7,
};
int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
unsigned long count, int mf_flags);
--
2.39.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
2023-02-17 14:48 ` [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
@ 2023-02-27 0:07 ` Dave Chinner
2023-02-27 10:06 ` Shiyang Ruan
0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2023-02-27 0:07 UTC (permalink / raw)
To: Shiyang Ruan
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong,
dan.j.williams, hch, jane.chu, akpm, willy
On Fri, Feb 17, 2023 at 02:48:32PM +0000, Shiyang Ruan wrote:
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1]. With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> (or mapped device) on it to unmap all files in use and notify processes
> who are using those files.
>
> Call trace:
> trigger unbind
> -> unbind_store()
> -> ... (skip)
> -> devres_release_all() # was pmem driver ->remove() in v1
> -> kill_dax()
> -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> -> xfs_dax_notify_failure()
>
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event. So do not shutdown filesystem directly if something not
> supported, or if failure range includes metadata area. Make sure all
> files and processes are handled correctly.
>
> [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
.....
> ---
> @@ -225,6 +242,15 @@ xfs_dax_notify_failure(
> if (offset + len - 1 > ddev_end)
> len = ddev_end - offset + 1;
>
> + if (mf_flags & MF_MEM_PRE_REMOVE) {
> + xfs_info(mp, "device is about to be removed!");
> + error = freeze_super(mp->m_super);
> + if (error)
> + return error;
> + /* invalidate_inode_pages2() invalidates dax mapping */
> + super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
> + }
Why do you still need to drop the pagecache here? My suggestion was
to replace it with freezing the filesystem at this point is to stop
it being dirtied further before the device remove actually occurs.
The userspace processes will be killed, their DAX mappings reclaimed
and the filesystem shut down before device removal occurs, so
super_drop_pagecache() is largely superfluous as it doesn't actually
provide any protection against racing with new mappings or dirtying
of existing/newly created mappings.
Freezing doesn't stop the creation of new mappings, either, it just
cleans all the dirty mappings and halts anything that is trying to
dirty existing clean mappings. It's not until we kill the userspace
processes that new mappings will be stopped, and it's not until we
shut the filesystem down that the filesystem itself will stop
accessing the storage.
Hence I don't see why you retained super_drop_pagecache() here at
all. Can you explain why it is still needed?
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
2023-02-27 0:07 ` Dave Chinner
@ 2023-02-27 10:06 ` Shiyang Ruan
2023-03-21 10:59 ` Shiyang Ruan
0 siblings, 1 reply; 16+ messages in thread
From: Shiyang Ruan @ 2023-02-27 10:06 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong,
dan.j.williams, hch, jane.chu, akpm, willy
在 2023/2/27 8:07, Dave Chinner 写道:
> On Fri, Feb 17, 2023 at 02:48:32PM +0000, Shiyang Ruan wrote:
>> This patch is inspired by Dan's "mm, dax, pmem: Introduce
>> dev_pagemap_failure()"[1]. With the help of dax_holder and
>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
>> (or mapped device) on it to unmap all files in use and notify processes
>> who are using those files.
>>
>> Call trace:
>> trigger unbind
>> -> unbind_store()
>> -> ... (skip)
>> -> devres_release_all() # was pmem driver ->remove() in v1
>> -> kill_dax()
>> -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>> -> xfs_dax_notify_failure()
>>
>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
>> event. So do not shutdown filesystem directly if something not
>> supported, or if failure range includes metadata area. Make sure all
>> files and processes are handled correctly.
>>
>> [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>
> .....
>
>> ---
>> @@ -225,6 +242,15 @@ xfs_dax_notify_failure(
>> if (offset + len - 1 > ddev_end)
>> len = ddev_end - offset + 1;
>>
>> + if (mf_flags & MF_MEM_PRE_REMOVE) {
>> + xfs_info(mp, "device is about to be removed!");
>> + error = freeze_super(mp->m_super);
>> + if (error)
>> + return error;
>> + /* invalidate_inode_pages2() invalidates dax mapping */
>> + super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
>> + }
>
> Why do you still need to drop the pagecache here? My suggestion was
> to replace it with freezing the filesystem at this point is to stop
> it being dirtied further before the device remove actually occurs.
> The userspace processes will be killed, their DAX mappings reclaimed
> and the filesystem shut down before device removal occurs, so
> super_drop_pagecache() is largely superfluous as it doesn't actually
> provide any protection against racing with new mappings or dirtying
> of existing/newly created mappings.
>
> Freezing doesn't stop the creation of new mappings, either, it just
> cleans all the dirty mappings and halts anything that is trying to
This is the point I wasn't aware of.
> dirty existing clean mappings. It's not until we kill the userspace
> processes that new mappings will be stopped, and it's not until we
> shut the filesystem down that the filesystem itself will stop
> accessing the storage.
>
> Hence I don't see why you retained super_drop_pagecache() here at
> all. Can you explain why it is still needed?
So I was just afraid that it's not enough for rmap & processes killer to
invalidate the dax mappings. If something error happened during the
rmap walker, the fs will shutdown and there is no chance to invalidate
the rest mappings whose user didn't be killed yet.
Now that freezing the fs is enough, I will remove the drop cache code.
--
Thanks,
Ruan.
>
> -Dave.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
2023-02-27 10:06 ` Shiyang Ruan
@ 2023-03-21 10:59 ` Shiyang Ruan
2023-03-24 9:49 ` Shiyang Ruan
0 siblings, 1 reply; 16+ messages in thread
From: Shiyang Ruan @ 2023-03-21 10:59 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong,
dan.j.williams, hch, jane.chu, akpm, willy
在 2023/2/27 18:06, Shiyang Ruan 写道:
>
>
> 在 2023/2/27 8:07, Dave Chinner 写道:
>> On Fri, Feb 17, 2023 at 02:48:32PM +0000, Shiyang Ruan wrote:
>>> This patch is inspired by Dan's "mm, dax, pmem: Introduce
>>> dev_pagemap_failure()"[1]. With the help of dax_holder and
>>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
>>> (or mapped device) on it to unmap all files in use and notify processes
>>> who are using those files.
>>>
>>> Call trace:
>>> trigger unbind
>>> -> unbind_store()
>>> -> ... (skip)
>>> -> devres_release_all() # was pmem driver ->remove() in v1
>>> -> kill_dax()
>>> -> dax_holder_notify_failure(dax_dev, 0, U64_MAX,
>>> MF_MEM_PRE_REMOVE)
>>> -> xfs_dax_notify_failure()
>>>
>>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
>>> event. So do not shutdown filesystem directly if something not
>>> supported, or if failure range includes metadata area. Make sure all
>>> files and processes are handled correctly.
>>>
>>> [1]:
>>> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
>>>
>>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>>
>> .....
>>
>>> ---
>>> @@ -225,6 +242,15 @@ xfs_dax_notify_failure(
>>> if (offset + len - 1 > ddev_end)
>>> len = ddev_end - offset + 1;
>>> + if (mf_flags & MF_MEM_PRE_REMOVE) {
>>> + xfs_info(mp, "device is about to be removed!");
>>> + error = freeze_super(mp->m_super);
>>> + if (error)
>>> + return error;
>>> + /* invalidate_inode_pages2() invalidates dax mapping */
>>> + super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
>>> + }
>>
>> Why do you still need to drop the pagecache here? My suggestion was
>> to replace it with freezing the filesystem at this point is to stop
>> it being dirtied further before the device remove actually occurs.
>> The userspace processes will be killed, their DAX mappings reclaimed
>> and the filesystem shut down before device removal occurs, so
>> super_drop_pagecache() is largely superfluous as it doesn't actually
>> provide any protection against racing with new mappings or dirtying
>> of existing/newly created mappings.
>>
>> Freezing doesn't stop the creation of new mappings, either, it just
>> cleans all the dirty mappings and halts anything that is trying to
>
> This is the point I wasn't aware of.
>
>> dirty existing clean mappings. It's not until we kill the userspace
>> processes that new mappings will be stopped, and it's not until we
>> shut the filesystem down that the filesystem itself will stop
>> accessing the storage.
>>
>> Hence I don't see why you retained super_drop_pagecache() here at
>> all. Can you explain why it is still needed?
>
>
> So I was just afraid that it's not enough for rmap & processes killer to
> invalidate the dax mappings. If something error happened during the
> rmap walker, the fs will shutdown and there is no chance to invalidate
> the rest mappings whose user didn't be killed yet.
>
> Now that freezing the fs is enough, I will remove the drop cache code.
I removed the drop cache code, then kernel always went into crash when
running the test[1]. After the investigation, I found that the crash is
cause by accessing (invalidate dax pages when umounting fs) the page of
a pmem while the pmem has been removed.
According to the design, the dax page should have been invalidated by
mf_dax_kill_procs() but it didn't. I found two reasons:
1. collect_procs_fsdax() only kills the current process
2. unmap_mapping_range() doesn't invalidate the dax pages
(disassociate dax entry in fs/dax.c), which causes the crash in my test
So, I think we should:
1. pass the mf_flag to collect_procs_fsdax() to let it collect all
processes associated with the file on the XFS.
2. drop cache is still needed, but just drop the associated files'
cache after mf_dax_kill_procs(), instead of dropping cache of the whole
filesystem.
Then the logic shuld be looked like this:
unbind
`-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
`-> xfs_dax_notify_failure()
`-> freeze_super()
`-> do xfs rmap
`-> mf_dax_kill_procs()
`-> collect_procs_fsdax() // all associated
`-> unmap_and_kill()
`-> invalidate_inode_pages2() // drop file's cache
`-> thaw_super()
[1] The step of unbind test:
1. create fsdax namespace on a pmem
2. mkfs.xfs on it
3. run fsx test in background
4. wait 1s
5. echo "pfn0.1" > unbind
6. wait 1s
7. umount xfs --> crash happened
--
Thanks,
Ruan.
>
>
> --
> Thanks,
> Ruan.
>
>>
>> -Dave.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
2023-03-21 10:59 ` Shiyang Ruan
@ 2023-03-24 9:49 ` Shiyang Ruan
0 siblings, 0 replies; 16+ messages in thread
From: Shiyang Ruan @ 2023-03-24 9:49 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-xfs, nvdimm, linux-fsdevel, linux-mm, djwong,
dan.j.williams, hch, jane.chu, akpm, willy
在 2023/3/21 18:59, Shiyang Ruan 写道:
>
>
> 在 2023/2/27 18:06, Shiyang Ruan 写道:
>>
>>
>> 在 2023/2/27 8:07, Dave Chinner 写道:
>>> On Fri, Feb 17, 2023 at 02:48:32PM +0000, Shiyang Ruan wrote:
>>>> This patch is inspired by Dan's "mm, dax, pmem: Introduce
>>>> dev_pagemap_failure()"[1]. With the help of dax_holder and
>>>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
>>>> (or mapped device) on it to unmap all files in use and notify processes
>>>> who are using those files.
>>>>
>>>> Call trace:
>>>> trigger unbind
>>>> -> unbind_store()
>>>> -> ... (skip)
>>>> -> devres_release_all() # was pmem driver ->remove() in v1
>>>> -> kill_dax()
>>>> -> dax_holder_notify_failure(dax_dev, 0, U64_MAX,
>>>> MF_MEM_PRE_REMOVE)
>>>> -> xfs_dax_notify_failure()
>>>>
>>>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
>>>> event. So do not shutdown filesystem directly if something not
>>>> supported, or if failure range includes metadata area. Make sure all
>>>> files and processes are handled correctly.
>>>>
>>>> [1]:
>>>> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
>>>>
>>>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>>>
>>> .....
>>>
>>>> ---
>>>> @@ -225,6 +242,15 @@ xfs_dax_notify_failure(
>>>> if (offset + len - 1 > ddev_end)
>>>> len = ddev_end - offset + 1;
>>>> + if (mf_flags & MF_MEM_PRE_REMOVE) {
>>>> + xfs_info(mp, "device is about to be removed!");
>>>> + error = freeze_super(mp->m_super);
>>>> + if (error)
>>>> + return error;
>>>> + /* invalidate_inode_pages2() invalidates dax mapping */
>>>> + super_drop_pagecache(mp->m_super, invalidate_inode_pages2);
>>>> + }
>>>
>>> Why do you still need to drop the pagecache here? My suggestion was
>>> to replace it with freezing the filesystem at this point is to stop
>>> it being dirtied further before the device remove actually occurs.
>>> The userspace processes will be killed, their DAX mappings reclaimed
>>> and the filesystem shut down before device removal occurs, so
>>> super_drop_pagecache() is largely superfluous as it doesn't actually
>>> provide any protection against racing with new mappings or dirtying
>>> of existing/newly created mappings.
>>>
>>> Freezing doesn't stop the creation of new mappings, either, it just
>>> cleans all the dirty mappings and halts anything that is trying to
>>
>> This is the point I wasn't aware of.
>>
>>> dirty existing clean mappings. It's not until we kill the userspace
>>> processes that new mappings will be stopped, and it's not until we
>>> shut the filesystem down that the filesystem itself will stop
>>> accessing the storage.
>>>
>>> Hence I don't see why you retained super_drop_pagecache() here at
>>> all. Can you explain why it is still needed?
>>
>>
>> So I was just afraid that it's not enough for rmap & processes killer
>> to invalidate the dax mappings. If something error happened during
>> the rmap walker, the fs will shutdown and there is no chance to
>> invalidate the rest mappings whose user didn't be killed yet.
>>
>> Now that freezing the fs is enough, I will remove the drop cache code.
>
> I removed the drop cache code, then kernel always went into crash when
> running the test[1]. After the investigation, I found that the crash is
> cause by accessing (invalidate dax pages when umounting fs) the page of
> a pmem while the pmem has been removed.
>
> According to the design, the dax page should have been invalidated by
> mf_dax_kill_procs() but it didn't. I found two reasons:
> 1. collect_procs_fsdax() only kills the current process
> 2. unmap_mapping_range() doesn't invalidate the dax pages
> (disassociate dax entry in fs/dax.c), which causes the crash in my test
>
> So, I think we should:
> 1. pass the mf_flag to collect_procs_fsdax() to let it collect all
> processes associated with the file on the XFS.
> 2. drop cache is still needed, but just drop the associated files'
> cache after mf_dax_kill_procs(), instead of dropping cache of the whole
> filesystem.
>
> Then the logic shuld be looked like this:
> unbind
> `-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> `-> xfs_dax_notify_failure()
> `-> freeze_super()
> `-> do xfs rmap
> `-> mf_dax_kill_procs()
> `-> collect_procs_fsdax() // all associated
> `-> unmap_and_kill()
> `-> invalidate_inode_pages2() // drop file's cache
> `-> thaw_super()
>
>
> [1] The step of unbind test:
> 1. create fsdax namespace on a pmem
> 2. mkfs.xfs on it
> 3. run fsx test in background
> 4. wait 1s
> 5. echo "pfn0.1" > unbind
> 6. wait 1s
> 7. umount xfs --> crash happened
>
Hi,
Any comments?
>
> --
> Thanks,
> Ruan.
>
>>
>>
>> --
>> Thanks,
>> Ruan.
>>
>>>
>>> -Dave.
^ permalink raw reply [flat|nested] 16+ messages in thread