linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: a few questions about pagevc_lookup_entries
       [not found] <CANQeFDCCGED3BR0oTpzQ75gtGpdGCw8FLf+kspBYytw3YteXAw@mail.gmail.com>
@ 2019-06-20  8:36 ` Jan Kara
  2019-06-20 22:03   ` Liu Bo
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2019-06-20  8:36 UTC (permalink / raw)
  To: Liu Bo
  Cc: willy, dan.j.williams, Fengguang Wu, Jan Kara, linux-fsdevel,
	linux-nvdimm

[added some relevant lists to CC - this can safe some people debugging by
being able to google this discussion]

On Wed 19-06-19 15:57:38, Liu Bo wrote:
> I found a weird dead loop within invalidate_inode_pages2_range, the
> reason being that  pagevec_lookup_entries(index=1) returns an indices
> array which has only one entry storing value 0, and this has led
> invalidate_inode_pages2_range() to a dead loop, something like,
> 
> invalidate_inode_pages2_range()
>   -> while (pagevec_lookup_entries(index=1, indices))
>     ->  for (i = 0; i < pagevec_count(&pvec); i++) {
>       -> index = indices[0]; // index is set to 0
>       -> if (radix_tree_exceptional_entry(page)) {
>           -> if (!invalidate_exceptional_entry2()) //
>                   ->__dax_invalidate_mapping_entry // return 0
>                      -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
>                  ret = -EBUSY;
>           ->continue;
>           } // end of if (radix_tree_exceptional_entry(page))
>     -> index++; // index is set to 1
> 
> The following debug[1] proved the above analysis,  I was wondering if
> this was a corner case that  pagevec_lookup_entries() allows or a
> known bug that has been fixed upstream?
> 
> ps: the kernel in use is 4.19.30 (LTS).

Hum, the above trace suggests you are using DAX. Are you really? Because the
stacktrace below shows we are working on fuse inode so that shouldn't
really be DAX inode...

								Honza

> [1]:
> $git diff
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 71b65aab8077..82bfeeb53135 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -692,6 +692,7 @@ int invalidate_inode_pages2_range(struct
> address_space *mapping,
>                         struct page *page = pvec.pages[i];
> 
>                         /* We rely upon deletion not changing page->index */
> +                       WARN_ONCE(index > indices[i], "index = %d
> indices[%d]=%d\n", index, i, indices[i]);
>                         index = indices[i];
>                         if (index > end)
>                                 break;
> 
> [  129.095383] ------------[ cut here ]------------
> [  129.096164] index = 1 indices[0]=0
> [  129.096786] WARNING: CPU: 0 PID: 3022 at mm/truncate.c:695
> invalidate_inode_pages2_range+0x471/0x500
> [  129.098234] Modules linked in:
> [  129.098717] CPU: 0 PID: 3022 Comm: doio Not tainted 4.19.30+ #4
> ...
> [  129.101288] RIP: 0010:invalidate_inode_pages2_range+0x471/0x500
> ...
> [  129.114162] Call Trace:
> [  129.114623]  ? __schedule+0x2ad/0x860
> [  129.115214]  ? prepare_to_wait_event+0x80/0x140
> [  129.115903]  ? finish_wait+0x3f/0x80
> [  129.116452]  ? request_wait_answer+0x13d/0x210
> [  129.117128]  ? remove_wait_queue+0x60/0x60
> [  129.117757]  ? make_kgid+0x13/0x20
> [  129.118277]  ? fuse_change_attributes_common+0x7d/0x130
> [  129.119057]  ? fuse_change_attributes+0x8d/0x120
> [  129.119754]  fuse_dentry_revalidate+0x2c5/0x300
> [  129.120456]  lookup_fast+0x237/0x2b0
> [  129.121018]  path_openat+0x15f/0x1380
> [  129.121614]  ? generic_update_time+0x6b/0xd0
> [  129.122316]  do_filp_open+0x91/0x100
> [  129.122876]  do_sys_open+0x126/0x210
> [  129.123453]  do_syscall_64+0x55/0x180
> [  129.124036]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  129.124820] RIP: 0033:0x7fbe0cd75e80
> ...
> [  129.134574] ---[ end trace c0fc0bbc5aebf0dc ]---
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a few questions about pagevc_lookup_entries
  2019-06-20  8:36 ` a few questions about pagevc_lookup_entries Jan Kara
@ 2019-06-20 22:03   ` Liu Bo
  2019-06-24  7:25     ` Miklos Szeredi
  0 siblings, 1 reply; 5+ messages in thread
From: Liu Bo @ 2019-06-20 22:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: willy, dan.j.williams, Fengguang Wu, linux-fsdevel, linux-nvdimm

On Thu, Jun 20, 2019 at 1:36 AM Jan Kara <jack@suse.cz> wrote:
>
> [added some relevant lists to CC - this can safe some people debugging by
> being able to google this discussion]
>
> On Wed 19-06-19 15:57:38, Liu Bo wrote:
> > I found a weird dead loop within invalidate_inode_pages2_range, the
> > reason being that  pagevec_lookup_entries(index=1) returns an indices
> > array which has only one entry storing value 0, and this has led
> > invalidate_inode_pages2_range() to a dead loop, something like,
> >
> > invalidate_inode_pages2_range()
> >   -> while (pagevec_lookup_entries(index=1, indices))
> >     ->  for (i = 0; i < pagevec_count(&pvec); i++) {
> >       -> index = indices[0]; // index is set to 0
> >       -> if (radix_tree_exceptional_entry(page)) {
> >           -> if (!invalidate_exceptional_entry2()) //
> >                   ->__dax_invalidate_mapping_entry // return 0
> >                      -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
> >                  ret = -EBUSY;
> >           ->continue;
> >           } // end of if (radix_tree_exceptional_entry(page))
> >     -> index++; // index is set to 1
> >
> > The following debug[1] proved the above analysis,  I was wondering if
> > this was a corner case that  pagevec_lookup_entries() allows or a
> > known bug that has been fixed upstream?
> >
> > ps: the kernel in use is 4.19.30 (LTS).
>
> Hum, the above trace suggests you are using DAX. Are you really? Because the
> stacktrace below shows we are working on fuse inode so that shouldn't
> really be DAX inode...
>

So I was running tests against virtiofs[1] which adds dax support to
fuse, with dax, fuse provides posix stuff while dax provides data
channel.

[1]: https://virtio-fs.gitlab.io/
https://gitlab.com/virtio-fs/linux

thanks,
liubo

>                                                                 Honza
>
> > [1]:
> > $git diff
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index 71b65aab8077..82bfeeb53135 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -692,6 +692,7 @@ int invalidate_inode_pages2_range(struct
> > address_space *mapping,
> >                         struct page *page = pvec.pages[i];
> >
> >                         /* We rely upon deletion not changing page->index */
> > +                       WARN_ONCE(index > indices[i], "index = %d
> > indices[%d]=%d\n", index, i, indices[i]);
> >                         index = indices[i];
> >                         if (index > end)
> >                                 break;
> >
> > [  129.095383] ------------[ cut here ]------------
> > [  129.096164] index = 1 indices[0]=0
> > [  129.096786] WARNING: CPU: 0 PID: 3022 at mm/truncate.c:695
> > invalidate_inode_pages2_range+0x471/0x500
> > [  129.098234] Modules linked in:
> > [  129.098717] CPU: 0 PID: 3022 Comm: doio Not tainted 4.19.30+ #4
> > ...
> > [  129.101288] RIP: 0010:invalidate_inode_pages2_range+0x471/0x500
> > ...
> > [  129.114162] Call Trace:
> > [  129.114623]  ? __schedule+0x2ad/0x860
> > [  129.115214]  ? prepare_to_wait_event+0x80/0x140
> > [  129.115903]  ? finish_wait+0x3f/0x80
> > [  129.116452]  ? request_wait_answer+0x13d/0x210
> > [  129.117128]  ? remove_wait_queue+0x60/0x60
> > [  129.117757]  ? make_kgid+0x13/0x20
> > [  129.118277]  ? fuse_change_attributes_common+0x7d/0x130
> > [  129.119057]  ? fuse_change_attributes+0x8d/0x120
> > [  129.119754]  fuse_dentry_revalidate+0x2c5/0x300
> > [  129.120456]  lookup_fast+0x237/0x2b0
> > [  129.121018]  path_openat+0x15f/0x1380
> > [  129.121614]  ? generic_update_time+0x6b/0xd0
> > [  129.122316]  do_filp_open+0x91/0x100
> > [  129.122876]  do_sys_open+0x126/0x210
> > [  129.123453]  do_syscall_64+0x55/0x180
> > [  129.124036]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  129.124820] RIP: 0033:0x7fbe0cd75e80
> > ...
> > [  129.134574] ---[ end trace c0fc0bbc5aebf0dc ]---
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a few questions about pagevc_lookup_entries
  2019-06-20 22:03   ` Liu Bo
@ 2019-06-24  7:25     ` Miklos Szeredi
  2019-06-24 10:41       ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi @ 2019-06-24  7:25 UTC (permalink / raw)
  To: Liu Bo
  Cc: Jan Kara, Matthew Wilcox, dan.j.williams, Fengguang Wu,
	linux-fsdevel, linux-nvdimm, Vivek Goyal, Dr. David Alan Gilbert,
	Stefan Hajnoczi

[cc: vivek, stefan, dgilbert]

On Fri, Jun 21, 2019 at 12:04 AM Liu Bo <obuil.liubo@gmail.com> wrote:
>
> On Thu, Jun 20, 2019 at 1:36 AM Jan Kara <jack@suse.cz> wrote:
> >
> > [added some relevant lists to CC - this can safe some people debugging by
> > being able to google this discussion]
> >
> > On Wed 19-06-19 15:57:38, Liu Bo wrote:
> > > I found a weird dead loop within invalidate_inode_pages2_range, the
> > > reason being that  pagevec_lookup_entries(index=1) returns an indices
> > > array which has only one entry storing value 0, and this has led
> > > invalidate_inode_pages2_range() to a dead loop, something like,
> > >
> > > invalidate_inode_pages2_range()
> > >   -> while (pagevec_lookup_entries(index=1, indices))
> > >     ->  for (i = 0; i < pagevec_count(&pvec); i++) {
> > >       -> index = indices[0]; // index is set to 0
> > >       -> if (radix_tree_exceptional_entry(page)) {
> > >           -> if (!invalidate_exceptional_entry2()) //
> > >                   ->__dax_invalidate_mapping_entry // return 0
> > >                      -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
> > >                  ret = -EBUSY;
> > >           ->continue;
> > >           } // end of if (radix_tree_exceptional_entry(page))
> > >     -> index++; // index is set to 1
> > >
> > > The following debug[1] proved the above analysis,  I was wondering if
> > > this was a corner case that  pagevec_lookup_entries() allows or a
> > > known bug that has been fixed upstream?
> > >
> > > ps: the kernel in use is 4.19.30 (LTS).
> >
> > Hum, the above trace suggests you are using DAX. Are you really? Because the
> > stacktrace below shows we are working on fuse inode so that shouldn't
> > really be DAX inode...
> >
>
> So I was running tests against virtiofs[1] which adds dax support to
> fuse, with dax, fuse provides posix stuff while dax provides data
> channel.
>
> [1]: https://virtio-fs.gitlab.io/
> https://gitlab.com/virtio-fs/linux
>
> thanks,
> liubo
>
> >                                                                 Honza
> >
> > > [1]:
> > > $git diff
> > > diff --git a/mm/truncate.c b/mm/truncate.c
> > > index 71b65aab8077..82bfeeb53135 100644
> > > --- a/mm/truncate.c
> > > +++ b/mm/truncate.c
> > > @@ -692,6 +692,7 @@ int invalidate_inode_pages2_range(struct
> > > address_space *mapping,
> > >                         struct page *page = pvec.pages[i];
> > >
> > >                         /* We rely upon deletion not changing page->index */
> > > +                       WARN_ONCE(index > indices[i], "index = %d
> > > indices[%d]=%d\n", index, i, indices[i]);
> > >                         index = indices[i];
> > >                         if (index > end)
> > >                                 break;
> > >
> > > [  129.095383] ------------[ cut here ]------------
> > > [  129.096164] index = 1 indices[0]=0
> > > [  129.096786] WARNING: CPU: 0 PID: 3022 at mm/truncate.c:695
> > > invalidate_inode_pages2_range+0x471/0x500
> > > [  129.098234] Modules linked in:
> > > [  129.098717] CPU: 0 PID: 3022 Comm: doio Not tainted 4.19.30+ #4
> > > ...
> > > [  129.101288] RIP: 0010:invalidate_inode_pages2_range+0x471/0x500
> > > ...
> > > [  129.114162] Call Trace:
> > > [  129.114623]  ? __schedule+0x2ad/0x860
> > > [  129.115214]  ? prepare_to_wait_event+0x80/0x140
> > > [  129.115903]  ? finish_wait+0x3f/0x80
> > > [  129.116452]  ? request_wait_answer+0x13d/0x210
> > > [  129.117128]  ? remove_wait_queue+0x60/0x60
> > > [  129.117757]  ? make_kgid+0x13/0x20
> > > [  129.118277]  ? fuse_change_attributes_common+0x7d/0x130
> > > [  129.119057]  ? fuse_change_attributes+0x8d/0x120
> > > [  129.119754]  fuse_dentry_revalidate+0x2c5/0x300
> > > [  129.120456]  lookup_fast+0x237/0x2b0
> > > [  129.121018]  path_openat+0x15f/0x1380
> > > [  129.121614]  ? generic_update_time+0x6b/0xd0
> > > [  129.122316]  do_filp_open+0x91/0x100
> > > [  129.122876]  do_sys_open+0x126/0x210
> > > [  129.123453]  do_syscall_64+0x55/0x180
> > > [  129.124036]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [  129.124820] RIP: 0033:0x7fbe0cd75e80
> > > ...
> > > [  129.134574] ---[ end trace c0fc0bbc5aebf0dc ]---
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a few questions about pagevc_lookup_entries
  2019-06-24  7:25     ` Miklos Szeredi
@ 2019-06-24 10:41       ` Jan Kara
  2019-06-24 17:13         ` Liu Bo
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2019-06-24 10:41 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Liu Bo, Jan Kara, Matthew Wilcox, dan.j.williams, Fengguang Wu,
	linux-fsdevel, linux-nvdimm, Vivek Goyal, Dr. David Alan Gilbert,
	Stefan Hajnoczi

On Mon 24-06-19 09:25:00, Miklos Szeredi wrote:
> [cc: vivek, stefan, dgilbert]
> 
> On Fri, Jun 21, 2019 at 12:04 AM Liu Bo <obuil.liubo@gmail.com> wrote:
> >
> > On Thu, Jun 20, 2019 at 1:36 AM Jan Kara <jack@suse.cz> wrote:
> > >
> > > [added some relevant lists to CC - this can safe some people debugging by
> > > being able to google this discussion]
> > >
> > > On Wed 19-06-19 15:57:38, Liu Bo wrote:
> > > > I found a weird dead loop within invalidate_inode_pages2_range, the
> > > > reason being that  pagevec_lookup_entries(index=1) returns an indices
> > > > array which has only one entry storing value 0, and this has led
> > > > invalidate_inode_pages2_range() to a dead loop, something like,
> > > >
> > > > invalidate_inode_pages2_range()
> > > >   -> while (pagevec_lookup_entries(index=1, indices))
> > > >     ->  for (i = 0; i < pagevec_count(&pvec); i++) {
> > > >       -> index = indices[0]; // index is set to 0
> > > >       -> if (radix_tree_exceptional_entry(page)) {
> > > >           -> if (!invalidate_exceptional_entry2()) //
> > > >                   ->__dax_invalidate_mapping_entry // return 0
> > > >                      -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
> > > >                  ret = -EBUSY;
> > > >           ->continue;
> > > >           } // end of if (radix_tree_exceptional_entry(page))
> > > >     -> index++; // index is set to 1
> > > >
> > > > The following debug[1] proved the above analysis,  I was wondering if
> > > > this was a corner case that  pagevec_lookup_entries() allows or a
> > > > known bug that has been fixed upstream?
> > > >
> > > > ps: the kernel in use is 4.19.30 (LTS).
> > >
> > > Hum, the above trace suggests you are using DAX. Are you really? Because the
> > > stacktrace below shows we are working on fuse inode so that shouldn't
> > > really be DAX inode...
> > >
> >
> > So I was running tests against virtiofs[1] which adds dax support to
> > fuse, with dax, fuse provides posix stuff while dax provides data
> > channel.
> >
> > [1]: https://virtio-fs.gitlab.io/
> > https://gitlab.com/virtio-fs/linux

OK, thanks for the explanation and the pointer. So if I should guess, I'd
say that there's some problem with multiorder entries (for PMD pages) in
the radix tree. In particular if you lookup index 1 and there's
multiorder entry for indices 0-511, radix_tree_next_chunk() is updating
iter->index like:

iter->index = (index &~ node_maxindex(node)) | (offset << node->shift);

and offset is computed by radix_tree_descend() as:

offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;

So this all results in iter->index being set to 0 and thus confusing the
iteration in invalidate_inode_pages2_range(). Current kernel has xarray
code from Matthew which maintains originally passed index in xas.xa_index
and thus the problem isn't there.

So to sum up: Seems like a DAX-specific bug with PMD entries in older
kernels fixed by xarray rewrite.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a few questions about pagevc_lookup_entries
  2019-06-24 10:41       ` Jan Kara
@ 2019-06-24 17:13         ` Liu Bo
  0 siblings, 0 replies; 5+ messages in thread
From: Liu Bo @ 2019-06-24 17:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Miklos Szeredi, Matthew Wilcox, dan.j.williams, Fengguang Wu,
	linux-fsdevel, linux-nvdimm, Vivek Goyal, Dr. David Alan Gilbert,
	Stefan Hajnoczi

On Mon, Jun 24, 2019 at 3:41 AM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 24-06-19 09:25:00, Miklos Szeredi wrote:
> > [cc: vivek, stefan, dgilbert]
> >
> > On Fri, Jun 21, 2019 at 12:04 AM Liu Bo <obuil.liubo@gmail.com> wrote:
> > >
> > > On Thu, Jun 20, 2019 at 1:36 AM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > [added some relevant lists to CC - this can safe some people debugging by
> > > > being able to google this discussion]
> > > >
> > > > On Wed 19-06-19 15:57:38, Liu Bo wrote:
> > > > > I found a weird dead loop within invalidate_inode_pages2_range, the
> > > > > reason being that  pagevec_lookup_entries(index=1) returns an indices
> > > > > array which has only one entry storing value 0, and this has led
> > > > > invalidate_inode_pages2_range() to a dead loop, something like,
> > > > >
> > > > > invalidate_inode_pages2_range()
> > > > >   -> while (pagevec_lookup_entries(index=1, indices))
> > > > >     ->  for (i = 0; i < pagevec_count(&pvec); i++) {
> > > > >       -> index = indices[0]; // index is set to 0
> > > > >       -> if (radix_tree_exceptional_entry(page)) {
> > > > >           -> if (!invalidate_exceptional_entry2()) //
> > > > >                   ->__dax_invalidate_mapping_entry // return 0
> > > > >                      -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
> > > > >                  ret = -EBUSY;
> > > > >           ->continue;
> > > > >           } // end of if (radix_tree_exceptional_entry(page))
> > > > >     -> index++; // index is set to 1
> > > > >
> > > > > The following debug[1] proved the above analysis,  I was wondering if
> > > > > this was a corner case that  pagevec_lookup_entries() allows or a
> > > > > known bug that has been fixed upstream?
> > > > >
> > > > > ps: the kernel in use is 4.19.30 (LTS).
> > > >
> > > > Hum, the above trace suggests you are using DAX. Are you really? Because the
> > > > stacktrace below shows we are working on fuse inode so that shouldn't
> > > > really be DAX inode...
> > > >
> > >
> > > So I was running tests against virtiofs[1] which adds dax support to
> > > fuse, with dax, fuse provides posix stuff while dax provides data
> > > channel.
> > >
> > > [1]: https://virtio-fs.gitlab.io/
> > > https://gitlab.com/virtio-fs/linux
>
> OK, thanks for the explanation and the pointer. So if I should guess, I'd
> say that there's some problem with multiorder entries (for PMD pages) in
> the radix tree. In particular if you lookup index 1 and there's
> multiorder entry for indices 0-511, radix_tree_next_chunk() is updating
> iter->index like:
>
> iter->index = (index &~ node_maxindex(node)) | (offset << node->shift);
>
> and offset is computed by radix_tree_descend() as:
>
> offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
>
> So this all results in iter->index being set to 0 and thus confusing the
> iteration in invalidate_inode_pages2_range(). Current kernel has xarray
> code from Matthew which maintains originally passed index in xas.xa_index
> and thus the problem isn't there.
>
> So to sum up: Seems like a DAX-specific bug with PMD entries in older
> kernels fixed by xarray rewrite.
>

Thank you so much for the information, Jan.

I'll double check if that's the root cause and report back, if yes,
guess then we have to fix 4.19's radix tree in place to do the right
thing instead of porting back xarray rewrite..

thanks,
liubo

>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-24 17:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CANQeFDCCGED3BR0oTpzQ75gtGpdGCw8FLf+kspBYytw3YteXAw@mail.gmail.com>
2019-06-20  8:36 ` a few questions about pagevc_lookup_entries Jan Kara
2019-06-20 22:03   ` Liu Bo
2019-06-24  7:25     ` Miklos Szeredi
2019-06-24 10:41       ` Jan Kara
2019-06-24 17:13         ` Liu Bo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).