BUG in nfs_read_folio() when tracing is enabled

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* BUG in nfs_read_folio() when tracing is enabled
@ 2024-06-19 20:45 Chuck Lever III
  2024-07-02 20:59 ` David Wysochanski
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-06-19 20:45 UTC (permalink / raw)
  To: Dave Wysochanski; +Cc: Linux NFS Mailing List

Hi Dave-

I'm testing pNFS SCSI layouts with tracing enabled on the
client, and I hit this BUG twice today:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 ? __die_body.cold+0x19/0x27
 ? page_fault_oops+0xca/0x2a0
 ? search_module_extables+0x19/0x60
 ? search_bpf_extables+0x5f/0x80
 ? exc_page_fault+0x7e/0x180
 ? asm_exc_page_fault+0x26/0x30
 ? nfs_folio_length+0x29/0x170 [nfs]
 trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
 nfs_read_folio+0x16d/0x260 [nfs]
 ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
 filemap_read_folio+0x43/0xe0
 filemap_fault+0x70d/0xd00
 __do_fault+0x35/0x120
 do_fault+0xbb/0x470
 __handle_mm_fault+0x7e9/0x1060
 ? mt_find+0x21c/0x570
 handle_mm_fault+0xf0/0x300
 do_user_addr_fault+0x217/0x620
 exc_page_fault+0x7e/0x180
 asm_exc_page_fault+0x26/0x30
RIP: 0033:0x7f1d25ecedb7
Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
 </TASK>
Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0

cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
nfs_folio_length+0x29/0x170:
nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)

Which in my version of the source code is:

822 static inline size_t nfs_folio_length(struct folio *folio)
823 {
824         loff_t i_size = i_size_read(folio_file_mapping(folio)->host);

I'm pinging you first about this because you changed the
code around the trace_nfs_aop_read_done() call site in
commit 000dbe0bec05 ("NFS: Convert buffered read paths to
use netfs when fscache is enabled").


--
Chuck Lever



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-06-19 20:45 BUG in nfs_read_folio() when tracing is enabled Chuck Lever III
@ 2024-07-02 20:59 ` David Wysochanski
  2024-07-02 21:14   ` David Wysochanski
  2024-07-02 21:17   ` Chuck Lever III
  0 siblings, 2 replies; 12+ messages in thread
From: David Wysochanski @ 2024-07-02 20:59 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Dave Wysochanski, Linux NFS Mailing List

Sorry for the late reply, more below.

On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
> Hi Dave-
>
> I'm testing pNFS SCSI layouts with tracing enabled on the
> client, and I hit this BUG twice today:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> Call Trace:
>  <TASK>
>  ? __die_body.cold+0x19/0x27
>  ? page_fault_oops+0xca/0x2a0
>  ? search_module_extables+0x19/0x60
>  ? search_bpf_extables+0x5f/0x80
>  ? exc_page_fault+0x7e/0x180
>  ? asm_exc_page_fault+0x26/0x30
>  ? nfs_folio_length+0x29/0x170 [nfs]
>  trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
>  nfs_read_folio+0x16d/0x260 [nfs]
>  ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
>  filemap_read_folio+0x43/0xe0
>  filemap_fault+0x70d/0xd00
>  __do_fault+0x35/0x120
>  do_fault+0xbb/0x470
>  __handle_mm_fault+0x7e9/0x1060
>  ? mt_find+0x21c/0x570
>  handle_mm_fault+0xf0/0x300
>  do_user_addr_fault+0x217/0x620
>  exc_page_fault+0x7e/0x180
>  asm_exc_page_fault+0x26/0x30
> RIP: 0033:0x7f1d25ecedb7
> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
>  </TASK>
> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
> CR2: 0000000000000000
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>
> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
> nfs_folio_length+0x29/0x170:
> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
>
> Which in my version of the source code is:
>
> 822 static inline size_t nfs_folio_length(struct folio *folio)
> 823 {
> 824         loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>

Ok so looks like we need to go deeper to know what the NULL pointer
represented in the BUG statement.
Decoding the oops:
$ ~/git/kernel/scripts/decodecode < ./oops.txt
Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
All code
========
   0: 90                    nop
   1: 41 56                push   %r14
   3: 41 55                push   %r13
   5: 41 54                push   %r12
   7: 55                    push   %rbp
   8: 53                    push   %rbx
   9: 48 89 fb              mov    %rdi,%rbx
   c: 48 83 ec 08          sub    $0x8,%rsp
  10: 48 8b 07              mov    (%rdi),%rax
  13: a9 00 00 04 00        test   $0x40000,%eax
  18: 74 0c                je     0x26
  1a: 48 8b 07              mov    (%rdi),%rax
  1d: f6 c4 08              test   $0x8,%ah
  20: 0f 85 89 00 00 00    jne    0xaf
  26: 48 8b 43 18          mov    0x18(%rbx),%rax
  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
  2d: 48 8b 68 50          mov    0x50(%rax),%rbp
  31: 48 85 ed              test   %rbp,%rbp
  34: 7e 66                jle    0x9c
  36: 48 8b 03              mov    (%rbx),%rax
  39: a8 40                test   $0x40,%al
  3b: 74 7c                je     0xb9
  3d: 4c                    rex.WR
  3e: 8b                    .byte 0x8b
  3f: 6b                    .byte 0x6b

Code starting with the faulting instruction
===========================================
   0: 48 8b 00              mov    (%rax),%rax
   3: 48 8b 68 50          mov    0x50(%rax),%rbp
   7: 48 85 ed              test   %rbp,%rbp
   a: 7e 66                jle    0x72
   c: 48 8b 03              mov    (%rbx),%rax
   f: a8 40                test   $0x40,%al
  11: 74 7c                je     0x8f
  13: 4c                    rex.WR
  14: 8b                    .byte 0x8b
  15: 6b                    .byte 0x6b


Snip out the essential instructions, which shows we just have read an
address at an offset of 0x18, then that must have been NULL and we
tried to dereference NULL which led to the BUG:
   9: 48 89 fb              mov    %rdi,%rbx
...
  26: 48 8b 43 18          mov    0x18(%rbx),%rax
  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction

So the question is what is at offset 0x18 in the above code in
nfs_folio_length()?
Now folio_file_mapping() returns an address_space pointer ,and
address_space->host is at offset 0.
So if 'mapping' is at offset 0x18 of folio, then mapping is NULL.  And
from the code, this seems to fit.

 310 struct folio {
 311         /* private: don't document the anon union */
 312         union {
 313                 struct {
 314         /* public: */
 315                         unsigned long flags;
 316                         union {
 317                                 struct list_head lru;
 318         /* private: avoid cluttering the output */
 319                                 struct {
 320                                         void *__filler;
 321         /* public: */
 322                                         unsigned int mlock_count;
 323         /* private: */
 324                                 };
 325         /* public: */
 326                         };
 327                         struct address_space *mapping;
 328                         pgoff_t index;


 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
 446 {
 447         if (unlikely(folio_test_swapcache(folio)))
 448                 return swapcache_mapping(folio);
 449
 450         return folio->mapping;
 451 }

So it seems folio->mapping == NULL and that is why we BUG.


> I'm pinging you first about this because you changed the
> code around the trace_nfs_aop_read_done() call site in
> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
> use netfs when fscache is enabled").
>

If I have the above analysis right, then maybe this patch would fix it?
Also I wonder if nfs_page_length() might have the same issue?

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..832f07994b88 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
  */
 static inline size_t nfs_folio_length(struct folio *folio)
 {
-       loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
+       loff_t i_size;
+
+       if (!folio->mapping)
+               return 0;
+
+       i_size = i_size_read(folio_file_mapping(folio)->host);

        if (i_size > 0) {
                pgoff_t index = folio_index(folio) >> folio_order(folio);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-02 20:59 ` David Wysochanski
@ 2024-07-02 21:14   ` David Wysochanski
  2024-07-02 21:17   ` Chuck Lever III
  1 sibling, 0 replies; 12+ messages in thread
From: David Wysochanski @ 2024-07-02 21:14 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Dave Wysochanski, Linux NFS Mailing List

On Tue, Jul 2, 2024 at 4:59 PM David Wysochanski <dwysocha@redhat.com> wrote:
>
> Sorry for the late reply, more below.
>
> On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> > Hi Dave-
> >
> > I'm testing pNFS SCSI layouts with tracing enabled on the
> > client, and I hit this BUG twice today:
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> > RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> > RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> > RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> > RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> > R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> > R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> > FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> > Call Trace:
> >  <TASK>
> >  ? __die_body.cold+0x19/0x27
> >  ? page_fault_oops+0xca/0x2a0
> >  ? search_module_extables+0x19/0x60
> >  ? search_bpf_extables+0x5f/0x80
> >  ? exc_page_fault+0x7e/0x180
> >  ? asm_exc_page_fault+0x26/0x30
> >  ? nfs_folio_length+0x29/0x170 [nfs]
> >  trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
> >  nfs_read_folio+0x16d/0x260 [nfs]
> >  ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
> >  filemap_read_folio+0x43/0xe0
> >  filemap_fault+0x70d/0xd00
> >  __do_fault+0x35/0x120
> >  do_fault+0xbb/0x470
> >  __handle_mm_fault+0x7e9/0x1060
> >  ? mt_find+0x21c/0x570
> >  handle_mm_fault+0xf0/0x300
> >  do_user_addr_fault+0x217/0x620
> >  exc_page_fault+0x7e/0x180
> >  asm_exc_page_fault+0x26/0x30
> > RIP: 0033:0x7f1d25ecedb7
> > Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
> > RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
> > RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
> > RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
> > RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
> > R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
> >  </TASK>
> > Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
> > CR2: 0000000000000000
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> > RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> > RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> > RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> > R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> > R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> > FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> >
> > cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
> > nfs_folio_length+0x29/0x170:
> > nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
> >
> > Which in my version of the source code is:
> >
> > 822 static inline size_t nfs_folio_length(struct folio *folio)
> > 823 {
> > 824         loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> >
>
> Ok so looks like we need to go deeper to know what the NULL pointer
> represented in the BUG statement.
> Decoding the oops:
> $ ~/git/kernel/scripts/decodecode < ./oops.txt
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
> 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
> 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> All code
> ========
>    0: 90                    nop
>    1: 41 56                push   %r14
>    3: 41 55                push   %r13
>    5: 41 54                push   %r12
>    7: 55                    push   %rbp
>    8: 53                    push   %rbx
>    9: 48 89 fb              mov    %rdi,%rbx
>    c: 48 83 ec 08          sub    $0x8,%rsp
>   10: 48 8b 07              mov    (%rdi),%rax
>   13: a9 00 00 04 00        test   $0x40000,%eax
>   18: 74 0c                je     0x26
>   1a: 48 8b 07              mov    (%rdi),%rax
>   1d: f6 c4 08              test   $0x8,%ah
>   20: 0f 85 89 00 00 00    jne    0xaf
>   26: 48 8b 43 18          mov    0x18(%rbx),%rax
>   2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
>   2d: 48 8b 68 50          mov    0x50(%rax),%rbp
>   31: 48 85 ed              test   %rbp,%rbp
>   34: 7e 66                jle    0x9c
>   36: 48 8b 03              mov    (%rbx),%rax
>   39: a8 40                test   $0x40,%al
>   3b: 74 7c                je     0xb9
>   3d: 4c                    rex.WR
>   3e: 8b                    .byte 0x8b
>   3f: 6b                    .byte 0x6b
>
> Code starting with the faulting instruction
> ===========================================
>    0: 48 8b 00              mov    (%rax),%rax
>    3: 48 8b 68 50          mov    0x50(%rax),%rbp
>    7: 48 85 ed              test   %rbp,%rbp
>    a: 7e 66                jle    0x72
>    c: 48 8b 03              mov    (%rbx),%rax
>    f: a8 40                test   $0x40,%al
>   11: 74 7c                je     0x8f
>   13: 4c                    rex.WR
>   14: 8b                    .byte 0x8b
>   15: 6b                    .byte 0x6b
>
>
> Snip out the essential instructions, which shows we just have read an
> address at an offset of 0x18, then that must have been NULL and we
> tried to dereference NULL which led to the BUG:
>    9: 48 89 fb              mov    %rdi,%rbx
> ...
>   26: 48 8b 43 18          mov    0x18(%rbx),%rax
>   2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
>
> So the question is what is at offset 0x18 in the above code in
> nfs_folio_length()?
> Now folio_file_mapping() returns an address_space pointer ,and
> address_space->host is at offset 0.
> So if 'mapping' is at offset 0x18 of folio, then mapping is NULL.  And
> from the code, this seems to fit.
>
>  310 struct folio {
>  311         /* private: don't document the anon union */
>  312         union {
>  313                 struct {
>  314         /* public: */
>  315                         unsigned long flags;
>  316                         union {
>  317                                 struct list_head lru;
>  318         /* private: avoid cluttering the output */
>  319                                 struct {
>  320                                         void *__filler;
>  321         /* public: */
>  322                                         unsigned int mlock_count;
>  323         /* private: */
>  324                                 };
>  325         /* public: */
>  326                         };
>  327                         struct address_space *mapping;
>  328                         pgoff_t index;
>
>
>  445 static inline struct address_space *folio_file_mapping(struct folio *folio)
>  446 {
>  447         if (unlikely(folio_test_swapcache(folio)))
>  448                 return swapcache_mapping(folio);
>  449
>  450         return folio->mapping;
>  451 }
>
> So it seems folio->mapping == NULL and that is why we BUG.
>
>
> > I'm pinging you first about this because you changed the
> > code around the trace_nfs_aop_read_done() call site in
> > commit 000dbe0bec05 ("NFS: Convert buffered read paths to
> > use netfs when fscache is enabled").
> >
>
> If I have the above analysis right, then maybe this patch would fix it?
> Also I wonder if nfs_page_length() might have the same issue?
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 9f0f4534744b..832f07994b88 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
>   */
>  static inline size_t nfs_folio_length(struct folio *folio)
>  {
> -       loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> +       loff_t i_size;
> +
> +       if (!folio->mapping)
> +               return 0;
> +
> +       i_size = i_size_read(folio_file_mapping(folio)->host);
>
>         if (i_size > 0) {
>                 pgoff_t index = folio_index(folio) >> folio_order(folio);

Correction - I should have used folio_file_mapping(folio) above
instead of folio->mapping
(compile only tested)

$ git diff
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..433661d54f33 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
  */
 static inline size_t nfs_folio_length(struct folio *folio)
 {
-       loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
+       loff_t i_size;
+
+       if (!folio_file_mapping(folio))
+               return 0;
+
+       i_size = i_size_read(folio_file_mapping(folio)->host);

        if (i_size > 0) {
                pgoff_t index = folio_index(folio) >> folio_order(folio);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-02 20:59 ` David Wysochanski
  2024-07-02 21:14   ` David Wysochanski
@ 2024-07-02 21:17   ` Chuck Lever III
  2024-07-02 21:33     ` David Wysochanski
  1 sibling, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-07-02 21:17 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Dave Wysochanski, Linux NFS Mailing List



> On Jul 2, 2024, at 4:59 PM, David Wysochanski <dwysocha@redhat.com> wrote:
> 
> Sorry for the late reply, more below.
> 
> On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> Hi Dave-
>> 
>> I'm testing pNFS SCSI layouts with tracing enabled on the
>> client, and I hit this BUG twice today:
>> 
>> BUG: kernel NULL pointer dereference, address: 0000000000000000
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 0 P4D 0  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>> Call Trace:
>> <TASK>
>> ? __die_body.cold+0x19/0x27
>> ? page_fault_oops+0xca/0x2a0
>> ? search_module_extables+0x19/0x60
>> ? search_bpf_extables+0x5f/0x80
>> ? exc_page_fault+0x7e/0x180
>> ? asm_exc_page_fault+0x26/0x30
>> ? nfs_folio_length+0x29/0x170 [nfs]
>> trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
>> nfs_read_folio+0x16d/0x260 [nfs]
>> ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
>> filemap_read_folio+0x43/0xe0
>> filemap_fault+0x70d/0xd00
>> __do_fault+0x35/0x120
>> do_fault+0xbb/0x470
>> __handle_mm_fault+0x7e9/0x1060
>> ? mt_find+0x21c/0x570
>> handle_mm_fault+0xf0/0x300
>> do_user_addr_fault+0x217/0x620
>> exc_page_fault+0x7e/0x180
>> asm_exc_page_fault+0x26/0x30
>> RIP: 0033:0x7f1d25ecedb7
>> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
>> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
>> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
>> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
>> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
>> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
>> </TASK>
>> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
>> CR2: 0000000000000000
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>> 
>> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
>> nfs_folio_length+0x29/0x170:
>> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
>> 
>> Which in my version of the source code is:
>> 
>> 822 static inline size_t nfs_folio_length(struct folio *folio)
>> 823 {
>> 824         loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>> 
> 
> Ok so looks like we need to go deeper to know what the NULL pointer
> represented in the BUG statement.
> Decoding the oops:
> $ ~/git/kernel/scripts/decodecode < ./oops.txt
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
> 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
> 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> All code
> ========
>   0: 90                    nop
>   1: 41 56                push   %r14
>   3: 41 55                push   %r13
>   5: 41 54                push   %r12
>   7: 55                    push   %rbp
>   8: 53                    push   %rbx
>   9: 48 89 fb              mov    %rdi,%rbx
>   c: 48 83 ec 08          sub    $0x8,%rsp
>  10: 48 8b 07              mov    (%rdi),%rax
>  13: a9 00 00 04 00        test   $0x40000,%eax
>  18: 74 0c                je     0x26
>  1a: 48 8b 07              mov    (%rdi),%rax
>  1d: f6 c4 08              test   $0x8,%ah
>  20: 0f 85 89 00 00 00    jne    0xaf
>  26: 48 8b 43 18          mov    0x18(%rbx),%rax
>  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
>  2d: 48 8b 68 50          mov    0x50(%rax),%rbp
>  31: 48 85 ed              test   %rbp,%rbp
>  34: 7e 66                jle    0x9c
>  36: 48 8b 03              mov    (%rbx),%rax
>  39: a8 40                test   $0x40,%al
>  3b: 74 7c                je     0xb9
>  3d: 4c                    rex.WR
>  3e: 8b                    .byte 0x8b
>  3f: 6b                    .byte 0x6b
> 
> Code starting with the faulting instruction
> ===========================================
>   0: 48 8b 00              mov    (%rax),%rax
>   3: 48 8b 68 50          mov    0x50(%rax),%rbp
>   7: 48 85 ed              test   %rbp,%rbp
>   a: 7e 66                jle    0x72
>   c: 48 8b 03              mov    (%rbx),%rax
>   f: a8 40                test   $0x40,%al
>  11: 74 7c                je     0x8f
>  13: 4c                    rex.WR
>  14: 8b                    .byte 0x8b
>  15: 6b                    .byte 0x6b
> 
> 
> Snip out the essential instructions, which shows we just have read an
> address at an offset of 0x18, then that must have been NULL and we
> tried to dereference NULL which led to the BUG:
>   9: 48 89 fb              mov    %rdi,%rbx
> ...
>  26: 48 8b 43 18          mov    0x18(%rbx),%rax
>  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
> 
> So the question is what is at offset 0x18 in the above code in
> nfs_folio_length()?
> Now folio_file_mapping() returns an address_space pointer ,and
> address_space->host is at offset 0.
> So if 'mapping' is at offset 0x18 of folio, then mapping is NULL.  And
> from the code, this seems to fit.
> 
> 310 struct folio {
> 311         /* private: don't document the anon union */
> 312         union {
> 313                 struct {
> 314         /* public: */
> 315                         unsigned long flags;
> 316                         union {
> 317                                 struct list_head lru;
> 318         /* private: avoid cluttering the output */
> 319                                 struct {
> 320                                         void *__filler;
> 321         /* public: */
> 322                                         unsigned int mlock_count;
> 323         /* private: */
> 324                                 };
> 325         /* public: */
> 326                         };
> 327                         struct address_space *mapping;
> 328                         pgoff_t index;
> 
> 
> 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
> 446 {
> 447         if (unlikely(folio_test_swapcache(folio)))
> 448                 return swapcache_mapping(folio);
> 449
> 450         return folio->mapping;
> 451 }
> 
> So it seems folio->mapping == NULL and that is why we BUG.
> 
> 
>> I'm pinging you first about this because you changed the
>> code around the trace_nfs_aop_read_done() call site in
>> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
>> use netfs when fscache is enabled").
>> 
> 
> If I have the above analysis right, then maybe this patch would fix it?
> Also I wonder if nfs_page_length() might have the same issue?
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 9f0f4534744b..832f07994b88 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
>  */
> static inline size_t nfs_folio_length(struct folio *folio)
> {
> -       loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> +       loff_t i_size;
> +
> +       if (!folio->mapping)
> +               return 0;
> +
> +       i_size = i_size_read(folio_file_mapping(folio)->host);
> 
>        if (i_size > 0) {
>                pgoff_t index = folio_index(folio) >> folio_order(folio);

Execution passed trace_nfs_aop_readpage() without incident,
so folio->mapping was a valid pointer as we entered
nfs_read_folio().

What I don't understand is whether you expect folio->mapping
to become NULL while nfs_read_folio() is running.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-02 21:17   ` Chuck Lever III
@ 2024-07-02 21:33     ` David Wysochanski
  2024-07-03  5:08       ` Christoph Hellwig
  2024-07-03 13:40       ` Chuck Lever III
  0 siblings, 2 replies; 12+ messages in thread
From: David Wysochanski @ 2024-07-02 21:33 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Dave Wysochanski, Linux NFS Mailing List

On Tue, Jul 2, 2024 at 5:17 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Jul 2, 2024, at 4:59 PM, David Wysochanski <dwysocha@redhat.com> wrote:
> >
> > Sorry for the late reply, more below.
> >
> > On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >> Hi Dave-
> >>
> >> I'm testing pNFS SCSI layouts with tracing enabled on the
> >> client, and I hit this BUG twice today:
> >>
> >> BUG: kernel NULL pointer dereference, address: 0000000000000000
> >> #PF: supervisor read access in kernel mode
> >> #PF: error_code(0x0000) - not-present page
> >> PGD 0 P4D 0  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> >> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
> >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> >> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> >> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> >> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> >> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> >> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> >> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> >> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> >> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> >> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> >> Call Trace:
> >> <TASK>
> >> ? __die_body.cold+0x19/0x27
> >> ? page_fault_oops+0xca/0x2a0
> >> ? search_module_extables+0x19/0x60
> >> ? search_bpf_extables+0x5f/0x80
> >> ? exc_page_fault+0x7e/0x180
> >> ? asm_exc_page_fault+0x26/0x30
> >> ? nfs_folio_length+0x29/0x170 [nfs]
> >> trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
> >> nfs_read_folio+0x16d/0x260 [nfs]
> >> ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
> >> filemap_read_folio+0x43/0xe0
> >> filemap_fault+0x70d/0xd00
> >> __do_fault+0x35/0x120
> >> do_fault+0xbb/0x470
> >> __handle_mm_fault+0x7e9/0x1060
> >> ? mt_find+0x21c/0x570
> >> handle_mm_fault+0xf0/0x300
> >> do_user_addr_fault+0x217/0x620
> >> exc_page_fault+0x7e/0x180
> >> asm_exc_page_fault+0x26/0x30
> >> RIP: 0033:0x7f1d25ecedb7
> >> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
> >> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
> >> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
> >> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
> >> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
> >> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
> >> </TASK>
> >> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
> >> CR2: 0000000000000000
> >> ---[ end trace 0000000000000000 ]---
> >> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> >> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> >> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> >> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> >> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> >> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> >> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> >> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> >> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> >>
> >> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
> >> nfs_folio_length+0x29/0x170:
> >> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
> >>
> >> Which in my version of the source code is:
> >>
> >> 822 static inline size_t nfs_folio_length(struct folio *folio)
> >> 823 {
> >> 824         loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> >>
> >
> > Ok so looks like we need to go deeper to know what the NULL pointer
> > represented in the BUG statement.
> > Decoding the oops:
> > $ ~/git/kernel/scripts/decodecode < ./oops.txt
> > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
> > 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
> > 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> > All code
> > ========
> >   0: 90                    nop
> >   1: 41 56                push   %r14
> >   3: 41 55                push   %r13
> >   5: 41 54                push   %r12
> >   7: 55                    push   %rbp
> >   8: 53                    push   %rbx
> >   9: 48 89 fb              mov    %rdi,%rbx
> >   c: 48 83 ec 08          sub    $0x8,%rsp
> >  10: 48 8b 07              mov    (%rdi),%rax
> >  13: a9 00 00 04 00        test   $0x40000,%eax
> >  18: 74 0c                je     0x26
> >  1a: 48 8b 07              mov    (%rdi),%rax
> >  1d: f6 c4 08              test   $0x8,%ah
> >  20: 0f 85 89 00 00 00    jne    0xaf
> >  26: 48 8b 43 18          mov    0x18(%rbx),%rax
> >  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
> >  2d: 48 8b 68 50          mov    0x50(%rax),%rbp
> >  31: 48 85 ed              test   %rbp,%rbp
> >  34: 7e 66                jle    0x9c
> >  36: 48 8b 03              mov    (%rbx),%rax
> >  39: a8 40                test   $0x40,%al
> >  3b: 74 7c                je     0xb9
> >  3d: 4c                    rex.WR
> >  3e: 8b                    .byte 0x8b
> >  3f: 6b                    .byte 0x6b
> >
> > Code starting with the faulting instruction
> > ===========================================
> >   0: 48 8b 00              mov    (%rax),%rax
> >   3: 48 8b 68 50          mov    0x50(%rax),%rbp
> >   7: 48 85 ed              test   %rbp,%rbp
> >   a: 7e 66                jle    0x72
> >   c: 48 8b 03              mov    (%rbx),%rax
> >   f: a8 40                test   $0x40,%al
> >  11: 74 7c                je     0x8f
> >  13: 4c                    rex.WR
> >  14: 8b                    .byte 0x8b
> >  15: 6b                    .byte 0x6b
> >
> >
> > Snip out the essential instructions, which shows we just have read an
> > address at an offset of 0x18, then that must have been NULL and we
> > tried to dereference NULL which led to the BUG:
> >   9: 48 89 fb              mov    %rdi,%rbx
> > ...
> >  26: 48 8b 43 18          mov    0x18(%rbx),%rax
> >  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
> >
> > So the question is what is at offset 0x18 in the above code in
> > nfs_folio_length()?
> > Now folio_file_mapping() returns an address_space pointer ,and
> > address_space->host is at offset 0.
> > So if 'mapping' is at offset 0x18 of folio, then mapping is NULL.  And
> > from the code, this seems to fit.
> >
> > 310 struct folio {
> > 311         /* private: don't document the anon union */
> > 312         union {
> > 313                 struct {
> > 314         /* public: */
> > 315                         unsigned long flags;
> > 316                         union {
> > 317                                 struct list_head lru;
> > 318         /* private: avoid cluttering the output */
> > 319                                 struct {
> > 320                                         void *__filler;
> > 321         /* public: */
> > 322                                         unsigned int mlock_count;
> > 323         /* private: */
> > 324                                 };
> > 325         /* public: */
> > 326                         };
> > 327                         struct address_space *mapping;
> > 328                         pgoff_t index;
> >
> >
> > 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
> > 446 {
> > 447         if (unlikely(folio_test_swapcache(folio)))
> > 448                 return swapcache_mapping(folio);
> > 449
> > 450         return folio->mapping;
> > 451 }
> >
> > So it seems folio->mapping == NULL and that is why we BUG.
> >
> >
> >> I'm pinging you first about this because you changed the
> >> code around the trace_nfs_aop_read_done() call site in
> >> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
> >> use netfs when fscache is enabled").
> >>
> >
> > If I have the above analysis right, then maybe this patch would fix it?
> > Also I wonder if nfs_page_length() might have the same issue?
> >
> > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > index 9f0f4534744b..832f07994b88 100644
> > --- a/fs/nfs/internal.h
> > +++ b/fs/nfs/internal.h
> > @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
> >  */
> > static inline size_t nfs_folio_length(struct folio *folio)
> > {
> > -       loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> > +       loff_t i_size;
> > +
> > +       if (!folio->mapping)
> > +               return 0;
> > +
> > +       i_size = i_size_read(folio_file_mapping(folio)->host);
> >
> >        if (i_size > 0) {
> >                pgoff_t index = folio_index(folio) >> folio_order(folio);
>
> Execution passed trace_nfs_aop_readpage() without incident,
> so folio->mapping was a valid pointer as we entered
> nfs_read_folio().
>
> What I don't understand is whether you expect folio->mapping
> to become NULL while nfs_read_folio() is running.
>

Was your test running both reads and truncates on the same file in parallel?

Once the read completes I believe the page is unlocked.  Then if
truncates are running in parallel, then I think this can happen
(folio->mapping = NULL as a result of truncate).

>
> --
> Chuck Lever
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-02 21:33     ` David Wysochanski
@ 2024-07-03  5:08       ` Christoph Hellwig
  2024-07-05 13:45         ` Chuck Lever III
  2024-07-03 13:40       ` Chuck Lever III
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-03  5:08 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Chuck Lever III, Dave Wysochanski, Linux NFS Mailing List

On Tue, Jul 02, 2024 at 05:33:58PM -0400, David Wysochanski wrote:

[annoying full quote snipped]

> Was your test running both reads and truncates on the same file in parallel?
> 
> Once the read completes I believe the page is unlocked.  Then if
> truncates are running in parallel, then I think this can happen
> (folio->mapping = NULL as a result of truncate).

Yes, folio->mapping is only stable when the folio is locked.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-02 21:33     ` David Wysochanski
  2024-07-03  5:08       ` Christoph Hellwig
@ 2024-07-03 13:40       ` Chuck Lever III
  2024-07-03 14:18         ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-07-03 13:40 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Dave Wysochanski, Linux NFS Mailing List



> On Jul 2, 2024, at 5:33 PM, David Wysochanski <dwysocha@redhat.com> wrote:
> 
> On Tue, Jul 2, 2024 at 5:17 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Jul 2, 2024, at 4:59 PM, David Wysochanski <dwysocha@redhat.com> wrote:
>>> 
>>> Sorry for the late reply, more below.
>>> 
>>> On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>> 
>>>> Hi Dave-
>>>> 
>>>> I'm testing pNFS SCSI layouts with tracing enabled on the
>>>> client, and I hit this BUG twice today:
>>>> 
>>>> BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>> #PF: supervisor read access in kernel mode
>>>> #PF: error_code(0x0000) - not-present page
>>>> PGD 0 P4D 0  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>>>> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>>>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>>>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>>>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>>>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>>>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>>>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>>>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>>>> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>>>> Call Trace:
>>>> <TASK>
>>>> ? __die_body.cold+0x19/0x27
>>>> ? page_fault_oops+0xca/0x2a0
>>>> ? search_module_extables+0x19/0x60
>>>> ? search_bpf_extables+0x5f/0x80
>>>> ? exc_page_fault+0x7e/0x180
>>>> ? asm_exc_page_fault+0x26/0x30
>>>> ? nfs_folio_length+0x29/0x170 [nfs]
>>>> trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
>>>> nfs_read_folio+0x16d/0x260 [nfs]
>>>> ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
>>>> filemap_read_folio+0x43/0xe0
>>>> filemap_fault+0x70d/0xd00
>>>> __do_fault+0x35/0x120
>>>> do_fault+0xbb/0x470
>>>> __handle_mm_fault+0x7e9/0x1060
>>>> ? mt_find+0x21c/0x570
>>>> handle_mm_fault+0xf0/0x300
>>>> do_user_addr_fault+0x217/0x620
>>>> exc_page_fault+0x7e/0x180
>>>> asm_exc_page_fault+0x26/0x30
>>>> RIP: 0033:0x7f1d25ecedb7
>>>> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
>>>> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
>>>> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
>>>> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
>>>> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
>>>> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
>>>> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
>>>> </TASK>
>>>> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
>>>> CR2: 0000000000000000
>>>> ---[ end trace 0000000000000000 ]---
>>>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>>>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>>>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>>>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>>>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>>>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>>>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>>>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>>>> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>>>> 
>>>> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
>>>> nfs_folio_length+0x29/0x170:
>>>> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
>>>> 
>>>> Which in my version of the source code is:
>>>> 
>>>> 822 static inline size_t nfs_folio_length(struct folio *folio)
>>>> 823 {
>>>> 824         loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>>>> 
>>> 
>>> Ok so looks like we need to go deeper to know what the NULL pointer
>>> represented in the BUG statement.
>>> Decoding the oops:
>>> $ ~/git/kernel/scripts/decodecode < ./oops.txt
>>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
>>> 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
>>> 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>>> All code
>>> ========
>>>  0: 90                    nop
>>>  1: 41 56                push   %r14
>>>  3: 41 55                push   %r13
>>>  5: 41 54                push   %r12
>>>  7: 55                    push   %rbp
>>>  8: 53                    push   %rbx
>>>  9: 48 89 fb              mov    %rdi,%rbx
>>>  c: 48 83 ec 08          sub    $0x8,%rsp
>>> 10: 48 8b 07              mov    (%rdi),%rax
>>> 13: a9 00 00 04 00        test   $0x40000,%eax
>>> 18: 74 0c                je     0x26
>>> 1a: 48 8b 07              mov    (%rdi),%rax
>>> 1d: f6 c4 08              test   $0x8,%ah
>>> 20: 0f 85 89 00 00 00    jne    0xaf
>>> 26: 48 8b 43 18          mov    0x18(%rbx),%rax
>>> 2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
>>> 2d: 48 8b 68 50          mov    0x50(%rax),%rbp
>>> 31: 48 85 ed              test   %rbp,%rbp
>>> 34: 7e 66                jle    0x9c
>>> 36: 48 8b 03              mov    (%rbx),%rax
>>> 39: a8 40                test   $0x40,%al
>>> 3b: 74 7c                je     0xb9
>>> 3d: 4c                    rex.WR
>>> 3e: 8b                    .byte 0x8b
>>> 3f: 6b                    .byte 0x6b
>>> 
>>> Code starting with the faulting instruction
>>> ===========================================
>>>  0: 48 8b 00              mov    (%rax),%rax
>>>  3: 48 8b 68 50          mov    0x50(%rax),%rbp
>>>  7: 48 85 ed              test   %rbp,%rbp
>>>  a: 7e 66                jle    0x72
>>>  c: 48 8b 03              mov    (%rbx),%rax
>>>  f: a8 40                test   $0x40,%al
>>> 11: 74 7c                je     0x8f
>>> 13: 4c                    rex.WR
>>> 14: 8b                    .byte 0x8b
>>> 15: 6b                    .byte 0x6b
>>> 
>>> 
>>> Snip out the essential instructions, which shows we just have read an
>>> address at an offset of 0x18, then that must have been NULL and we
>>> tried to dereference NULL which led to the BUG:
>>>  9: 48 89 fb              mov    %rdi,%rbx
>>> ...
>>> 26: 48 8b 43 18          mov    0x18(%rbx),%rax
>>> 2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
>>> 
>>> So the question is what is at offset 0x18 in the above code in
>>> nfs_folio_length()?
>>> Now folio_file_mapping() returns an address_space pointer ,and
>>> address_space->host is at offset 0.
>>> So if 'mapping' is at offset 0x18 of folio, then mapping is NULL.  And
>>> from the code, this seems to fit.
>>> 
>>> 310 struct folio {
>>> 311         /* private: don't document the anon union */
>>> 312         union {
>>> 313                 struct {
>>> 314         /* public: */
>>> 315                         unsigned long flags;
>>> 316                         union {
>>> 317                                 struct list_head lru;
>>> 318         /* private: avoid cluttering the output */
>>> 319                                 struct {
>>> 320                                         void *__filler;
>>> 321         /* public: */
>>> 322                                         unsigned int mlock_count;
>>> 323         /* private: */
>>> 324                                 };
>>> 325         /* public: */
>>> 326                         };
>>> 327                         struct address_space *mapping;
>>> 328                         pgoff_t index;
>>> 
>>> 
>>> 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
>>> 446 {
>>> 447         if (unlikely(folio_test_swapcache(folio)))
>>> 448                 return swapcache_mapping(folio);
>>> 449
>>> 450         return folio->mapping;
>>> 451 }
>>> 
>>> So it seems folio->mapping == NULL and that is why we BUG.
>>> 
>>> 
>>>> I'm pinging you first about this because you changed the
>>>> code around the trace_nfs_aop_read_done() call site in
>>>> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
>>>> use netfs when fscache is enabled").
>>>> 
>>> 
>>> If I have the above analysis right, then maybe this patch would fix it?
>>> Also I wonder if nfs_page_length() might have the same issue?
>>> 
>>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>>> index 9f0f4534744b..832f07994b88 100644
>>> --- a/fs/nfs/internal.h
>>> +++ b/fs/nfs/internal.h
>>> @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
>>> */
>>> static inline size_t nfs_folio_length(struct folio *folio)
>>> {
>>> -       loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>>> +       loff_t i_size;
>>> +
>>> +       if (!folio->mapping)
>>> +               return 0;
>>> +
>>> +       i_size = i_size_read(folio_file_mapping(folio)->host);
>>> 
>>>       if (i_size > 0) {
>>>               pgoff_t index = folio_index(folio) >> folio_order(folio);
>> 
>> Execution passed trace_nfs_aop_readpage() without incident,
>> so folio->mapping was a valid pointer as we entered
>> nfs_read_folio().
>> 
>> What I don't understand is whether you expect folio->mapping
>> to become NULL while nfs_read_folio() is running.
>> 
> 
> Was your test running both reads and truncates on the same file in parallel?

I didn't record it in my notes, but IIRC I was running fstests.
I'm pretty sure one or more of the fsx-based tests will do this.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-03 13:40       ` Chuck Lever III
@ 2024-07-03 14:18         ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-03 14:18 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: David Wysochanski, Dave Wysochanski, Linux NFS Mailing List

On Wed, Jul 03, 2024 at 01:40:32PM +0000, Chuck Lever III wrote:
> > Was your test running both reads and truncates on the same file in parallel?
> 
> I didn't record it in my notes, but IIRC I was running fstests.
> I'm pretty sure one or more of the fsx-based tests will do this.

Note that you don't have to truncate as in the truncate system call or
anything changing the file system.  Simply invalidating the page cache
can lead to that, and xfstests exercises that heavily in the tests that
mix buffered and direct I/O.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-03  5:08       ` Christoph Hellwig
@ 2024-07-05 13:45         ` Chuck Lever III
  2024-07-05 13:47           ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-07-05 13:45 UTC (permalink / raw)
  To: David Wysochanski, Dave Wysochanski
  Cc: Linux NFS Mailing List, Christoph Hellwig



> On Jul 3, 2024, at 1:08 AM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Tue, Jul 02, 2024 at 05:33:58PM -0400, David Wysochanski wrote:
> 
> [annoying full quote snipped]
> 
>> Was your test running both reads and truncates on the same file in parallel?
>> 
>> Once the read completes I believe the page is unlocked.  Then if
>> truncates are running in parallel, then I think this can happen
>> (folio->mapping = NULL as a result of truncate).
> 
> Yes, folio->mapping is only stable when the folio is locked.

So Dave, I haven't tested the patch you posted a couple
days ago, because there hasn't been a clear answer about
whether nfs_read_folio() needs to protect itself against
the ->mapping changing, in which case, that's probably
a better fix.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-05 13:45         ` Chuck Lever III
@ 2024-07-05 13:47           ` Christoph Hellwig
  2024-07-09 15:45             ` David Wysochanski
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-05 13:47 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: David Wysochanski, Dave Wysochanski, Linux NFS Mailing List,
	Christoph Hellwig

On Fri, Jul 05, 2024 at 01:45:11PM +0000, Chuck Lever III wrote:
> So Dave, I haven't tested the patch you posted a couple
> days ago, because there hasn't been a clear answer about
> whether nfs_read_folio() needs to protect itself against
> the ->mapping changing, in which case, that's probably
> a better fix.

->read_folio is called with the folio locked and only unlocks it
on I/O completion, so it doesn't really need any protection.  So the
patch to simply move the trace point to before unlocking the folio
should fix the issue.

Alternatively we could just use the mapping from the inode variable
and pass it in.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-05 13:47           ` Christoph Hellwig
@ 2024-07-09 15:45             ` David Wysochanski
  2024-07-10  6:33               ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: David Wysochanski @ 2024-07-09 15:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chuck Lever III, Dave Wysochanski, Linux NFS Mailing List

On Fri, Jul 5, 2024 at 9:48 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Jul 05, 2024 at 01:45:11PM +0000, Chuck Lever III wrote:
> > So Dave, I haven't tested the patch you posted a couple
> > days ago, because there hasn't been a clear answer about
> > whether nfs_read_folio() needs to protect itself against
> > the ->mapping changing, in which case, that's probably
> > a better fix.
>
> ->read_folio is called with the folio locked and only unlocks it
> on I/O completion, so it doesn't really need any protection.  So the
> patch to simply move the trace point to before unlocking the folio
> should fix the issue.
>
I didn't see this so maybe you sent it privately or I missed a message.

> Alternatively we could just use the mapping from the inode variable
> and pass it in.
>
I'm not sure I follow - are you suggesting fixing the tracepoint?

Regardless of possible tracepoint movement or other fixes,
nfs_folio_length() needs to be patched because it should
handle folio->mapping == NULL.  Ditto for nfs_page_length().


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in nfs_read_folio() when tracing is enabled
  2024-07-09 15:45             ` David Wysochanski
@ 2024-07-10  6:33               ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-10  6:33 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Christoph Hellwig, Chuck Lever III, Dave Wysochanski,
	Linux NFS Mailing List

On Tue, Jul 09, 2024 at 11:45:11AM -0400, David Wysochanski wrote:
> > > So Dave, I haven't tested the patch you posted a couple
> > > days ago, because there hasn't been a clear answer about
> > > whether nfs_read_folio() needs to protect itself against
> > > the ->mapping changing, in which case, that's probably
> > > a better fix.
> >
> > ->read_folio is called with the folio locked and only unlocks it
> > on I/O completion, so it doesn't really need any protection.  So the
> > patch to simply move the trace point to before unlocking the folio
> > should fix the issue.
> >
> I didn't see this so maybe you sent it privately or I missed a message.
> 
> > Alternatively we could just use the mapping from the inode variable
> > and pass it in.
> >
> I'm not sure I follow - are you suggesting fixing the tracepoint?

I'm suggesting to fix the tracepoint folio.  I though I had something
out, but I guess it never made it.

> Regardless of possible tracepoint movement or other fixes,
> nfs_folio_length() needs to be patched because it should
> handle folio->mapping == NULL.  Ditto for nfs_page_length().

nfs_page_length is completely unused and I've already sent a patch
to remove it.  nfs_folio_length is only sanely usable with the
folio either locked or under writeback.  So adding this random
check there is probably not a good idea, but an audit of the callers
would be very useful.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-10  6:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-19 20:45 BUG in nfs_read_folio() when tracing is enabled Chuck Lever III
2024-07-02 20:59 ` David Wysochanski
2024-07-02 21:14   ` David Wysochanski
2024-07-02 21:17   ` Chuck Lever III
2024-07-02 21:33     ` David Wysochanski
2024-07-03  5:08       ` Christoph Hellwig
2024-07-05 13:45         ` Chuck Lever III
2024-07-05 13:47           ` Christoph Hellwig
2024-07-09 15:45             ` David Wysochanski
2024-07-10  6:33               ` Christoph Hellwig
2024-07-03 13:40       ` Chuck Lever III
2024-07-03 14:18         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox