* [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
@ 2020-09-21 1:43 Yafang Shao
2020-09-21 22:34 ` Mel Gorman
0 siblings, 1 reply; 5+ messages in thread
From: Yafang Shao @ 2020-09-21 1:43 UTC (permalink / raw)
To: akpm, mgorman, hannes, mhocko; +Cc: linux-mm, Yafang Shao
Our users reported that there're some random latency spikes when their RT
process is running. Finally we found that latency spike is caused by
FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
remote CPUs, and then waits the per-cpu work to complete. The wait time
is uncertain, which may be tens millisecond.
That behavior is unreasonable, because this process is bound to a
specific CPU and the file is only accessed by itself, IOW, there should
be no pagecache pages on a per-cpu pagevec of a remote CPU. That
unreasonable behavior is partially caused by the wrong comparation of the
number of invalidated pages and the number of the target. For example,
if (count < (end_index - start_index + 1))
The count above is how many pages were invalidated in the local CPU, and
(end_index - start_index + 1) is how many pages should be invalidated.
The usage of (end_index - start_index + 1) is incorrect, because they
are virtual addresses, which may not mapped to pages. We'd better use
inode->i_data.nrpages as the target.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
mm/fadvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 0e66f2aaeea3..ec25c91194a3 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -163,7 +163,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
* a per-cpu pagevec for a remote CPU. Drain all
* pagevecs and try again.
*/
- if (count < (end_index - start_index + 1)) {
+ if (count < inode->i_data.nrpages) {
lru_add_drain_all();
invalidate_mapping_pages(mapping, start_index,
end_index);
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
2020-09-21 1:43 [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Yafang Shao
@ 2020-09-21 22:34 ` Mel Gorman
2020-09-22 2:12 ` Yafang Shao
0 siblings, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2020-09-21 22:34 UTC (permalink / raw)
To: Yafang Shao; +Cc: akpm, hannes, mhocko, linux-mm
On Mon, Sep 21, 2020 at 09:43:17AM +0800, Yafang Shao wrote:
> Our users reported that there're some random latency spikes when their RT
> process is running. Finally we found that latency spike is caused by
> FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> remote CPUs, and then waits the per-cpu work to complete. The wait time
> is uncertain, which may be tens millisecond.
> That behavior is unreasonable, because this process is bound to a
> specific CPU and the file is only accessed by itself, IOW, there should
> be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> unreasonable behavior is partially caused by the wrong comparation of the
> number of invalidated pages and the number of the target. For example,
> if (count < (end_index - start_index + 1))
> The count above is how many pages were invalidated in the local CPU, and
> (end_index - start_index + 1) is how many pages should be invalidated.
> The usage of (end_index - start_index + 1) is incorrect, because they
> are virtual addresses, which may not mapped to pages. We'd better use
> inode->i_data.nrpages as the target.
>
How does that work if the invalidation is for a subset of the file?
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
2020-09-21 22:34 ` Mel Gorman
@ 2020-09-22 2:12 ` Yafang Shao
2020-09-22 7:23 ` Mel Gorman
0 siblings, 1 reply; 5+ messages in thread
From: Yafang Shao @ 2020-09-22 2:12 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Linux MM
On Tue, Sep 22, 2020 at 6:34 AM Mel Gorman <mgorman@suse.de> wrote:
>
> On Mon, Sep 21, 2020 at 09:43:17AM +0800, Yafang Shao wrote:
> > Our users reported that there're some random latency spikes when their RT
> > process is running. Finally we found that latency spike is caused by
> > FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> > remote CPUs, and then waits the per-cpu work to complete. The wait time
> > is uncertain, which may be tens millisecond.
> > That behavior is unreasonable, because this process is bound to a
> > specific CPU and the file is only accessed by itself, IOW, there should
> > be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> > unreasonable behavior is partially caused by the wrong comparation of the
> > number of invalidated pages and the number of the target. For example,
> > if (count < (end_index - start_index + 1))
> > The count above is how many pages were invalidated in the local CPU, and
> > (end_index - start_index + 1) is how many pages should be invalidated.
> > The usage of (end_index - start_index + 1) is incorrect, because they
> > are virtual addresses, which may not mapped to pages. We'd better use
> > inode->i_data.nrpages as the target.
> >
>
> How does that work if the invalidation is for a subset of the file?
>
I realized it as well. There are some solutions to improve it.
Option 1, take the min as the target.
- if (count < (end_index - start_index + 1)) {
+ target = min_t(unsigned long, inode->i_data.nrpages,
+ end_index - start_index + 1);
+ if (count < target) {
lru_add_drain_all();
Option 2, change the prototype of invalidate_mapping_pages and then
check how many pages were skipped.
+ struct invalidate_stat {
+ unsigned long skipped; // how many pages were skipped
+ unsigned long invalidated; // how many pages were invalidated
+};
- unsigned long invalidate_mapping_pages(struct address_space *mapping,
+unsigned long invalidate_mapping_pages(struct address_space *mapping,
struct invalidate_stat *stat,
I prefer option 2.
What do you think ?
--
Thanks
Yafang
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
2020-09-22 2:12 ` Yafang Shao
@ 2020-09-22 7:23 ` Mel Gorman
2020-09-23 10:05 ` Yafang Shao
0 siblings, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2020-09-22 7:23 UTC (permalink / raw)
To: Yafang Shao; +Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Linux MM
On Tue, Sep 22, 2020 at 10:12:31AM +0800, Yafang Shao wrote:
> On Tue, Sep 22, 2020 at 6:34 AM Mel Gorman <mgorman@suse.de> wrote:
> >
> > On Mon, Sep 21, 2020 at 09:43:17AM +0800, Yafang Shao wrote:
> > > Our users reported that there're some random latency spikes when their RT
> > > process is running. Finally we found that latency spike is caused by
> > > FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> > > remote CPUs, and then waits the per-cpu work to complete. The wait time
> > > is uncertain, which may be tens millisecond.
> > > That behavior is unreasonable, because this process is bound to a
> > > specific CPU and the file is only accessed by itself, IOW, there should
> > > be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> > > unreasonable behavior is partially caused by the wrong comparation of the
> > > number of invalidated pages and the number of the target. For example,
> > > if (count < (end_index - start_index + 1))
> > > The count above is how many pages were invalidated in the local CPU, and
> > > (end_index - start_index + 1) is how many pages should be invalidated.
> > > The usage of (end_index - start_index + 1) is incorrect, because they
> > > are virtual addresses, which may not mapped to pages. We'd better use
> > > inode->i_data.nrpages as the target.
> > >
> >
> > How does that work if the invalidation is for a subset of the file?
> >
>
> I realized it as well. There are some solutions to improve it.
>
> Option 1, take the min as the target.
> - if (count < (end_index - start_index + 1)) {
> + target = min_t(unsigned long, inode->i_data.nrpages,
> + end_index - start_index + 1);
> + if (count < target) {
> lru_add_drain_all();
>
> Option 2, change the prototype of invalidate_mapping_pages and then
> check how many pages were skipped.
>
> + struct invalidate_stat {
> + unsigned long skipped; // how many pages were skipped
> + unsigned long invalidated; // how many pages were invalidated
> +};
>
> - unsigned long invalidate_mapping_pages(struct address_space *mapping,
> +unsigned long invalidate_mapping_pages(struct address_space *mapping,
> struct invalidate_stat *stat,
>
That would involve updating each caller and the struct is
unnecessarily heavy. Create one that returns via **nr_lruvec. For
invalidate_mapping_pages, pass in NULL as nr_lruvec. Create a new helper
for fadvise that accepts nr_lruvec. In the common helper, account for pages
that are likely on an LRU and count them in nr_lruvec if !NULL. Update
fadvise to drain only if pages were skipped that were on the lruvec. That
should also deal with the case where holes have been punched between
start and end.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
2020-09-22 7:23 ` Mel Gorman
@ 2020-09-23 10:05 ` Yafang Shao
0 siblings, 0 replies; 5+ messages in thread
From: Yafang Shao @ 2020-09-23 10:05 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Linux MM
On Tue, Sep 22, 2020 at 3:23 PM Mel Gorman <mgorman@suse.de> wrote:
>
> On Tue, Sep 22, 2020 at 10:12:31AM +0800, Yafang Shao wrote:
> > On Tue, Sep 22, 2020 at 6:34 AM Mel Gorman <mgorman@suse.de> wrote:
> > >
> > > On Mon, Sep 21, 2020 at 09:43:17AM +0800, Yafang Shao wrote:
> > > > Our users reported that there're some random latency spikes when their RT
> > > > process is running. Finally we found that latency spike is caused by
> > > > FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on
> > > > remote CPUs, and then waits the per-cpu work to complete. The wait time
> > > > is uncertain, which may be tens millisecond.
> > > > That behavior is unreasonable, because this process is bound to a
> > > > specific CPU and the file is only accessed by itself, IOW, there should
> > > > be no pagecache pages on a per-cpu pagevec of a remote CPU. That
> > > > unreasonable behavior is partially caused by the wrong comparation of the
> > > > number of invalidated pages and the number of the target. For example,
> > > > if (count < (end_index - start_index + 1))
> > > > The count above is how many pages were invalidated in the local CPU, and
> > > > (end_index - start_index + 1) is how many pages should be invalidated.
> > > > The usage of (end_index - start_index + 1) is incorrect, because they
> > > > are virtual addresses, which may not mapped to pages. We'd better use
> > > > inode->i_data.nrpages as the target.
> > > >
> > >
> > > How does that work if the invalidation is for a subset of the file?
> > >
> >
> > I realized it as well. There are some solutions to improve it.
> >
> > Option 1, take the min as the target.
> > - if (count < (end_index - start_index + 1)) {
> > + target = min_t(unsigned long, inode->i_data.nrpages,
> > + end_index - start_index + 1);
> > + if (count < target) {
> > lru_add_drain_all();
> >
> > Option 2, change the prototype of invalidate_mapping_pages and then
> > check how many pages were skipped.
> >
> > + struct invalidate_stat {
> > + unsigned long skipped; // how many pages were skipped
> > + unsigned long invalidated; // how many pages were invalidated
> > +};
> >
> > - unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > +unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > struct invalidate_stat *stat,
> >
>
> That would involve updating each caller and the struct is
> unnecessarily heavy. Create one that returns via **nr_lruvec. For
> invalidate_mapping_pages, pass in NULL as nr_lruvec. Create a new helper
> for fadvise that accepts nr_lruvec. In the common helper, account for pages
> that are likely on an LRU and count them in nr_lruvec if !NULL. Update
> fadvise to drain only if pages were skipped that were on the lruvec. That
> should also deal with the case where holes have been punched between
> start and end.
>
Good suggestion, thanks Mel.
I will send v2.
--
Thanks
Yafang
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-23 10:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-21 1:43 [PATCH] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Yafang Shao
2020-09-21 22:34 ` Mel Gorman
2020-09-22 2:12 ` Yafang Shao
2020-09-22 7:23 ` Mel Gorman
2020-09-23 10:05 ` Yafang Shao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).