* BUG in nfs_read_folio() when tracing is enabled
@ 2024-06-19 20:45 Chuck Lever III
2024-07-02 20:59 ` David Wysochanski
0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-06-19 20:45 UTC (permalink / raw)
To: Dave Wysochanski; +Cc: Linux NFS Mailing List
Hi Dave-
I'm testing pNFS SCSI layouts with tracing enabled on the
client, and I hit this BUG twice today:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
Call Trace:
<TASK>
? __die_body.cold+0x19/0x27
? page_fault_oops+0xca/0x2a0
? search_module_extables+0x19/0x60
? search_bpf_extables+0x5f/0x80
? exc_page_fault+0x7e/0x180
? asm_exc_page_fault+0x26/0x30
? nfs_folio_length+0x29/0x170 [nfs]
trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
nfs_read_folio+0x16d/0x260 [nfs]
? __pfx_nfs_read_folio+0x10/0x10 [nfs]
filemap_read_folio+0x43/0xe0
filemap_fault+0x70d/0xd00
__do_fault+0x35/0x120
do_fault+0xbb/0x470
__handle_mm_fault+0x7e9/0x1060
? mt_find+0x21c/0x570
handle_mm_fault+0xf0/0x300
do_user_addr_fault+0x217/0x620
exc_page_fault+0x7e/0x180
asm_exc_page_fault+0x26/0x30
RIP: 0033:0x7f1d25ecedb7
Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
</TASK>
Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
nfs_folio_length+0x29/0x170:
nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
Which in my version of the source code is:
822 static inline size_t nfs_folio_length(struct folio *folio)
823 {
824 loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
I'm pinging you first about this because you changed the
code around the trace_nfs_aop_read_done() call site in
commit 000dbe0bec05 ("NFS: Convert buffered read paths to
use netfs when fscache is enabled").
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-06-19 20:45 BUG in nfs_read_folio() when tracing is enabled Chuck Lever III
@ 2024-07-02 20:59 ` David Wysochanski
2024-07-02 21:14 ` David Wysochanski
2024-07-02 21:17 ` Chuck Lever III
0 siblings, 2 replies; 12+ messages in thread
From: David Wysochanski @ 2024-07-02 20:59 UTC (permalink / raw)
To: Chuck Lever III; +Cc: Dave Wysochanski, Linux NFS Mailing List
Sorry for the late reply, more below.
On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
> Hi Dave-
>
> I'm testing pNFS SCSI layouts with tracing enabled on the
> client, and I hit this BUG twice today:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> Call Trace:
> <TASK>
> ? __die_body.cold+0x19/0x27
> ? page_fault_oops+0xca/0x2a0
> ? search_module_extables+0x19/0x60
> ? search_bpf_extables+0x5f/0x80
> ? exc_page_fault+0x7e/0x180
> ? asm_exc_page_fault+0x26/0x30
> ? nfs_folio_length+0x29/0x170 [nfs]
> trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
> nfs_read_folio+0x16d/0x260 [nfs]
> ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
> filemap_read_folio+0x43/0xe0
> filemap_fault+0x70d/0xd00
> __do_fault+0x35/0x120
> do_fault+0xbb/0x470
> __handle_mm_fault+0x7e9/0x1060
> ? mt_find+0x21c/0x570
> handle_mm_fault+0xf0/0x300
> do_user_addr_fault+0x217/0x620
> exc_page_fault+0x7e/0x180
> asm_exc_page_fault+0x26/0x30
> RIP: 0033:0x7f1d25ecedb7
> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
> </TASK>
> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
> CR2: 0000000000000000
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>
> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
> nfs_folio_length+0x29/0x170:
> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
>
> Which in my version of the source code is:
>
> 822 static inline size_t nfs_folio_length(struct folio *folio)
> 823 {
> 824 loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>
Ok so looks like we need to go deeper to know what the NULL pointer
represented in the BUG statement.
Decoding the oops:
$ ~/git/kernel/scripts/decodecode < ./oops.txt
Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
All code
========
0: 90 nop
1: 41 56 push %r14
3: 41 55 push %r13
5: 41 54 push %r12
7: 55 push %rbp
8: 53 push %rbx
9: 48 89 fb mov %rdi,%rbx
c: 48 83 ec 08 sub $0x8,%rsp
10: 48 8b 07 mov (%rdi),%rax
13: a9 00 00 04 00 test $0x40000,%eax
18: 74 0c je 0x26
1a: 48 8b 07 mov (%rdi),%rax
1d: f6 c4 08 test $0x8,%ah
20: 0f 85 89 00 00 00 jne 0xaf
26: 48 8b 43 18 mov 0x18(%rbx),%rax
2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
2d: 48 8b 68 50 mov 0x50(%rax),%rbp
31: 48 85 ed test %rbp,%rbp
34: 7e 66 jle 0x9c
36: 48 8b 03 mov (%rbx),%rax
39: a8 40 test $0x40,%al
3b: 74 7c je 0xb9
3d: 4c rex.WR
3e: 8b .byte 0x8b
3f: 6b .byte 0x6b
Code starting with the faulting instruction
===========================================
0: 48 8b 00 mov (%rax),%rax
3: 48 8b 68 50 mov 0x50(%rax),%rbp
7: 48 85 ed test %rbp,%rbp
a: 7e 66 jle 0x72
c: 48 8b 03 mov (%rbx),%rax
f: a8 40 test $0x40,%al
11: 74 7c je 0x8f
13: 4c rex.WR
14: 8b .byte 0x8b
15: 6b .byte 0x6b
Snip out the essential instructions, which shows we just have read an
address at an offset of 0x18, then that must have been NULL and we
tried to dereference NULL which led to the BUG:
9: 48 89 fb mov %rdi,%rbx
...
26: 48 8b 43 18 mov 0x18(%rbx),%rax
2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
So the question is what is at offset 0x18 in the above code in
nfs_folio_length()?
Now folio_file_mapping() returns an address_space pointer ,and
address_space->host is at offset 0.
So if 'mapping' is at offset 0x18 of folio, then mapping is NULL. And
from the code, this seems to fit.
310 struct folio {
311 /* private: don't document the anon union */
312 union {
313 struct {
314 /* public: */
315 unsigned long flags;
316 union {
317 struct list_head lru;
318 /* private: avoid cluttering the output */
319 struct {
320 void *__filler;
321 /* public: */
322 unsigned int mlock_count;
323 /* private: */
324 };
325 /* public: */
326 };
327 struct address_space *mapping;
328 pgoff_t index;
445 static inline struct address_space *folio_file_mapping(struct folio *folio)
446 {
447 if (unlikely(folio_test_swapcache(folio)))
448 return swapcache_mapping(folio);
449
450 return folio->mapping;
451 }
So it seems folio->mapping == NULL and that is why we BUG.
> I'm pinging you first about this because you changed the
> code around the trace_nfs_aop_read_done() call site in
> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
> use netfs when fscache is enabled").
>
If I have the above analysis right, then maybe this patch would fix it?
Also I wonder if nfs_page_length() might have the same issue?
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..832f07994b88 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
*/
static inline size_t nfs_folio_length(struct folio *folio)
{
- loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
+ loff_t i_size;
+
+ if (!folio->mapping)
+ return 0;
+
+ i_size = i_size_read(folio_file_mapping(folio)->host);
if (i_size > 0) {
pgoff_t index = folio_index(folio) >> folio_order(folio);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-02 20:59 ` David Wysochanski
@ 2024-07-02 21:14 ` David Wysochanski
2024-07-02 21:17 ` Chuck Lever III
1 sibling, 0 replies; 12+ messages in thread
From: David Wysochanski @ 2024-07-02 21:14 UTC (permalink / raw)
To: Chuck Lever III; +Cc: Dave Wysochanski, Linux NFS Mailing List
On Tue, Jul 2, 2024 at 4:59 PM David Wysochanski <dwysocha@redhat.com> wrote:
>
> Sorry for the late reply, more below.
>
> On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> > Hi Dave-
> >
> > I'm testing pNFS SCSI layouts with tracing enabled on the
> > client, and I hit this BUG twice today:
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> > RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> > RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> > RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> > RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> > R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> > R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> > FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> > Call Trace:
> > <TASK>
> > ? __die_body.cold+0x19/0x27
> > ? page_fault_oops+0xca/0x2a0
> > ? search_module_extables+0x19/0x60
> > ? search_bpf_extables+0x5f/0x80
> > ? exc_page_fault+0x7e/0x180
> > ? asm_exc_page_fault+0x26/0x30
> > ? nfs_folio_length+0x29/0x170 [nfs]
> > trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
> > nfs_read_folio+0x16d/0x260 [nfs]
> > ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
> > filemap_read_folio+0x43/0xe0
> > filemap_fault+0x70d/0xd00
> > __do_fault+0x35/0x120
> > do_fault+0xbb/0x470
> > __handle_mm_fault+0x7e9/0x1060
> > ? mt_find+0x21c/0x570
> > handle_mm_fault+0xf0/0x300
> > do_user_addr_fault+0x217/0x620
> > exc_page_fault+0x7e/0x180
> > asm_exc_page_fault+0x26/0x30
> > RIP: 0033:0x7f1d25ecedb7
> > Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
> > RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
> > RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
> > RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
> > RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
> > R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
> > </TASK>
> > Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
> > CR2: 0000000000000000
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> > RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> > RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> > RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> > R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> > R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> > FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> >
> > cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
> > nfs_folio_length+0x29/0x170:
> > nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
> >
> > Which in my version of the source code is:
> >
> > 822 static inline size_t nfs_folio_length(struct folio *folio)
> > 823 {
> > 824 loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> >
>
> Ok so looks like we need to go deeper to know what the NULL pointer
> represented in the BUG statement.
> Decoding the oops:
> $ ~/git/kernel/scripts/decodecode < ./oops.txt
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
> 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
> 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> All code
> ========
> 0: 90 nop
> 1: 41 56 push %r14
> 3: 41 55 push %r13
> 5: 41 54 push %r12
> 7: 55 push %rbp
> 8: 53 push %rbx
> 9: 48 89 fb mov %rdi,%rbx
> c: 48 83 ec 08 sub $0x8,%rsp
> 10: 48 8b 07 mov (%rdi),%rax
> 13: a9 00 00 04 00 test $0x40000,%eax
> 18: 74 0c je 0x26
> 1a: 48 8b 07 mov (%rdi),%rax
> 1d: f6 c4 08 test $0x8,%ah
> 20: 0f 85 89 00 00 00 jne 0xaf
> 26: 48 8b 43 18 mov 0x18(%rbx),%rax
> 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
> 2d: 48 8b 68 50 mov 0x50(%rax),%rbp
> 31: 48 85 ed test %rbp,%rbp
> 34: 7e 66 jle 0x9c
> 36: 48 8b 03 mov (%rbx),%rax
> 39: a8 40 test $0x40,%al
> 3b: 74 7c je 0xb9
> 3d: 4c rex.WR
> 3e: 8b .byte 0x8b
> 3f: 6b .byte 0x6b
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 8b 00 mov (%rax),%rax
> 3: 48 8b 68 50 mov 0x50(%rax),%rbp
> 7: 48 85 ed test %rbp,%rbp
> a: 7e 66 jle 0x72
> c: 48 8b 03 mov (%rbx),%rax
> f: a8 40 test $0x40,%al
> 11: 74 7c je 0x8f
> 13: 4c rex.WR
> 14: 8b .byte 0x8b
> 15: 6b .byte 0x6b
>
>
> Snip out the essential instructions, which shows we just have read an
> address at an offset of 0x18, then that must have been NULL and we
> tried to dereference NULL which led to the BUG:
> 9: 48 89 fb mov %rdi,%rbx
> ...
> 26: 48 8b 43 18 mov 0x18(%rbx),%rax
> 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
>
> So the question is what is at offset 0x18 in the above code in
> nfs_folio_length()?
> Now folio_file_mapping() returns an address_space pointer ,and
> address_space->host is at offset 0.
> So if 'mapping' is at offset 0x18 of folio, then mapping is NULL. And
> from the code, this seems to fit.
>
> 310 struct folio {
> 311 /* private: don't document the anon union */
> 312 union {
> 313 struct {
> 314 /* public: */
> 315 unsigned long flags;
> 316 union {
> 317 struct list_head lru;
> 318 /* private: avoid cluttering the output */
> 319 struct {
> 320 void *__filler;
> 321 /* public: */
> 322 unsigned int mlock_count;
> 323 /* private: */
> 324 };
> 325 /* public: */
> 326 };
> 327 struct address_space *mapping;
> 328 pgoff_t index;
>
>
> 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
> 446 {
> 447 if (unlikely(folio_test_swapcache(folio)))
> 448 return swapcache_mapping(folio);
> 449
> 450 return folio->mapping;
> 451 }
>
> So it seems folio->mapping == NULL and that is why we BUG.
>
>
> > I'm pinging you first about this because you changed the
> > code around the trace_nfs_aop_read_done() call site in
> > commit 000dbe0bec05 ("NFS: Convert buffered read paths to
> > use netfs when fscache is enabled").
> >
>
> If I have the above analysis right, then maybe this patch would fix it?
> Also I wonder if nfs_page_length() might have the same issue?
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 9f0f4534744b..832f07994b88 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
> */
> static inline size_t nfs_folio_length(struct folio *folio)
> {
> - loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> + loff_t i_size;
> +
> + if (!folio->mapping)
> + return 0;
> +
> + i_size = i_size_read(folio_file_mapping(folio)->host);
>
> if (i_size > 0) {
> pgoff_t index = folio_index(folio) >> folio_order(folio);
Correction - I should have used folio_file_mapping(folio) above
instead of folio->mapping
(compile only tested)
$ git diff
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..433661d54f33 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
*/
static inline size_t nfs_folio_length(struct folio *folio)
{
- loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
+ loff_t i_size;
+
+ if (!folio_file_mapping(folio))
+ return 0;
+
+ i_size = i_size_read(folio_file_mapping(folio)->host);
if (i_size > 0) {
pgoff_t index = folio_index(folio) >> folio_order(folio);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-02 20:59 ` David Wysochanski
2024-07-02 21:14 ` David Wysochanski
@ 2024-07-02 21:17 ` Chuck Lever III
2024-07-02 21:33 ` David Wysochanski
1 sibling, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-07-02 21:17 UTC (permalink / raw)
To: David Wysochanski; +Cc: Dave Wysochanski, Linux NFS Mailing List
> On Jul 2, 2024, at 4:59 PM, David Wysochanski <dwysocha@redhat.com> wrote:
>
> Sorry for the late reply, more below.
>
> On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>> Hi Dave-
>>
>> I'm testing pNFS SCSI layouts with tracing enabled on the
>> client, and I hit this BUG twice today:
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000000
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>> Call Trace:
>> <TASK>
>> ? __die_body.cold+0x19/0x27
>> ? page_fault_oops+0xca/0x2a0
>> ? search_module_extables+0x19/0x60
>> ? search_bpf_extables+0x5f/0x80
>> ? exc_page_fault+0x7e/0x180
>> ? asm_exc_page_fault+0x26/0x30
>> ? nfs_folio_length+0x29/0x170 [nfs]
>> trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
>> nfs_read_folio+0x16d/0x260 [nfs]
>> ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
>> filemap_read_folio+0x43/0xe0
>> filemap_fault+0x70d/0xd00
>> __do_fault+0x35/0x120
>> do_fault+0xbb/0x470
>> __handle_mm_fault+0x7e9/0x1060
>> ? mt_find+0x21c/0x570
>> handle_mm_fault+0xf0/0x300
>> do_user_addr_fault+0x217/0x620
>> exc_page_fault+0x7e/0x180
>> asm_exc_page_fault+0x26/0x30
>> RIP: 0033:0x7f1d25ecedb7
>> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
>> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
>> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
>> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
>> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
>> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
>> </TASK>
>> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
>> CR2: 0000000000000000
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>>
>> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
>> nfs_folio_length+0x29/0x170:
>> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
>>
>> Which in my version of the source code is:
>>
>> 822 static inline size_t nfs_folio_length(struct folio *folio)
>> 823 {
>> 824 loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>>
>
> Ok so looks like we need to go deeper to know what the NULL pointer
> represented in the BUG statement.
> Decoding the oops:
> $ ~/git/kernel/scripts/decodecode < ./oops.txt
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
> 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
> 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> All code
> ========
> 0: 90 nop
> 1: 41 56 push %r14
> 3: 41 55 push %r13
> 5: 41 54 push %r12
> 7: 55 push %rbp
> 8: 53 push %rbx
> 9: 48 89 fb mov %rdi,%rbx
> c: 48 83 ec 08 sub $0x8,%rsp
> 10: 48 8b 07 mov (%rdi),%rax
> 13: a9 00 00 04 00 test $0x40000,%eax
> 18: 74 0c je 0x26
> 1a: 48 8b 07 mov (%rdi),%rax
> 1d: f6 c4 08 test $0x8,%ah
> 20: 0f 85 89 00 00 00 jne 0xaf
> 26: 48 8b 43 18 mov 0x18(%rbx),%rax
> 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
> 2d: 48 8b 68 50 mov 0x50(%rax),%rbp
> 31: 48 85 ed test %rbp,%rbp
> 34: 7e 66 jle 0x9c
> 36: 48 8b 03 mov (%rbx),%rax
> 39: a8 40 test $0x40,%al
> 3b: 74 7c je 0xb9
> 3d: 4c rex.WR
> 3e: 8b .byte 0x8b
> 3f: 6b .byte 0x6b
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 8b 00 mov (%rax),%rax
> 3: 48 8b 68 50 mov 0x50(%rax),%rbp
> 7: 48 85 ed test %rbp,%rbp
> a: 7e 66 jle 0x72
> c: 48 8b 03 mov (%rbx),%rax
> f: a8 40 test $0x40,%al
> 11: 74 7c je 0x8f
> 13: 4c rex.WR
> 14: 8b .byte 0x8b
> 15: 6b .byte 0x6b
>
>
> Snip out the essential instructions, which shows we just have read an
> address at an offset of 0x18, then that must have been NULL and we
> tried to dereference NULL which led to the BUG:
> 9: 48 89 fb mov %rdi,%rbx
> ...
> 26: 48 8b 43 18 mov 0x18(%rbx),%rax
> 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
>
> So the question is what is at offset 0x18 in the above code in
> nfs_folio_length()?
> Now folio_file_mapping() returns an address_space pointer ,and
> address_space->host is at offset 0.
> So if 'mapping' is at offset 0x18 of folio, then mapping is NULL. And
> from the code, this seems to fit.
>
> 310 struct folio {
> 311 /* private: don't document the anon union */
> 312 union {
> 313 struct {
> 314 /* public: */
> 315 unsigned long flags;
> 316 union {
> 317 struct list_head lru;
> 318 /* private: avoid cluttering the output */
> 319 struct {
> 320 void *__filler;
> 321 /* public: */
> 322 unsigned int mlock_count;
> 323 /* private: */
> 324 };
> 325 /* public: */
> 326 };
> 327 struct address_space *mapping;
> 328 pgoff_t index;
>
>
> 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
> 446 {
> 447 if (unlikely(folio_test_swapcache(folio)))
> 448 return swapcache_mapping(folio);
> 449
> 450 return folio->mapping;
> 451 }
>
> So it seems folio->mapping == NULL and that is why we BUG.
>
>
>> I'm pinging you first about this because you changed the
>> code around the trace_nfs_aop_read_done() call site in
>> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
>> use netfs when fscache is enabled").
>>
>
> If I have the above analysis right, then maybe this patch would fix it?
> Also I wonder if nfs_page_length() might have the same issue?
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 9f0f4534744b..832f07994b88 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
> */
> static inline size_t nfs_folio_length(struct folio *folio)
> {
> - loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> + loff_t i_size;
> +
> + if (!folio->mapping)
> + return 0;
> +
> + i_size = i_size_read(folio_file_mapping(folio)->host);
>
> if (i_size > 0) {
> pgoff_t index = folio_index(folio) >> folio_order(folio);
Execution passed trace_nfs_aop_readpage() without incident,
so folio->mapping was a valid pointer as we entered
nfs_read_folio().
What I don't understand is whether you expect folio->mapping
to become NULL while nfs_read_folio() is running.
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-02 21:17 ` Chuck Lever III
@ 2024-07-02 21:33 ` David Wysochanski
2024-07-03 5:08 ` Christoph Hellwig
2024-07-03 13:40 ` Chuck Lever III
0 siblings, 2 replies; 12+ messages in thread
From: David Wysochanski @ 2024-07-02 21:33 UTC (permalink / raw)
To: Chuck Lever III; +Cc: Dave Wysochanski, Linux NFS Mailing List
On Tue, Jul 2, 2024 at 5:17 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Jul 2, 2024, at 4:59 PM, David Wysochanski <dwysocha@redhat.com> wrote:
> >
> > Sorry for the late reply, more below.
> >
> > On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >> Hi Dave-
> >>
> >> I'm testing pNFS SCSI layouts with tracing enabled on the
> >> client, and I hit this BUG twice today:
> >>
> >> BUG: kernel NULL pointer dereference, address: 0000000000000000
> >> #PF: supervisor read access in kernel mode
> >> #PF: error_code(0x0000) - not-present page
> >> PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> >> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
> >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> >> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> >> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> >> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> >> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> >> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> >> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> >> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> >> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> >> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> >> Call Trace:
> >> <TASK>
> >> ? __die_body.cold+0x19/0x27
> >> ? page_fault_oops+0xca/0x2a0
> >> ? search_module_extables+0x19/0x60
> >> ? search_bpf_extables+0x5f/0x80
> >> ? exc_page_fault+0x7e/0x180
> >> ? asm_exc_page_fault+0x26/0x30
> >> ? nfs_folio_length+0x29/0x170 [nfs]
> >> trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
> >> nfs_read_folio+0x16d/0x260 [nfs]
> >> ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
> >> filemap_read_folio+0x43/0xe0
> >> filemap_fault+0x70d/0xd00
> >> __do_fault+0x35/0x120
> >> do_fault+0xbb/0x470
> >> __handle_mm_fault+0x7e9/0x1060
> >> ? mt_find+0x21c/0x570
> >> handle_mm_fault+0xf0/0x300
> >> do_user_addr_fault+0x217/0x620
> >> exc_page_fault+0x7e/0x180
> >> asm_exc_page_fault+0x26/0x30
> >> RIP: 0033:0x7f1d25ecedb7
> >> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
> >> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
> >> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
> >> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
> >> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
> >> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
> >> </TASK>
> >> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
> >> CR2: 0000000000000000
> >> ---[ end trace 0000000000000000 ]---
> >> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> >> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> >> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> >> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> >> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> >> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> >> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> >> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> >> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> >>
> >> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
> >> nfs_folio_length+0x29/0x170:
> >> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
> >>
> >> Which in my version of the source code is:
> >>
> >> 822 static inline size_t nfs_folio_length(struct folio *folio)
> >> 823 {
> >> 824 loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> >>
> >
> > Ok so looks like we need to go deeper to know what the NULL pointer
> > represented in the BUG statement.
> > Decoding the oops:
> > $ ~/git/kernel/scripts/decodecode < ./oops.txt
> > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
> > 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
> > 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> > All code
> > ========
> > 0: 90 nop
> > 1: 41 56 push %r14
> > 3: 41 55 push %r13
> > 5: 41 54 push %r12
> > 7: 55 push %rbp
> > 8: 53 push %rbx
> > 9: 48 89 fb mov %rdi,%rbx
> > c: 48 83 ec 08 sub $0x8,%rsp
> > 10: 48 8b 07 mov (%rdi),%rax
> > 13: a9 00 00 04 00 test $0x40000,%eax
> > 18: 74 0c je 0x26
> > 1a: 48 8b 07 mov (%rdi),%rax
> > 1d: f6 c4 08 test $0x8,%ah
> > 20: 0f 85 89 00 00 00 jne 0xaf
> > 26: 48 8b 43 18 mov 0x18(%rbx),%rax
> > 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
> > 2d: 48 8b 68 50 mov 0x50(%rax),%rbp
> > 31: 48 85 ed test %rbp,%rbp
> > 34: 7e 66 jle 0x9c
> > 36: 48 8b 03 mov (%rbx),%rax
> > 39: a8 40 test $0x40,%al
> > 3b: 74 7c je 0xb9
> > 3d: 4c rex.WR
> > 3e: 8b .byte 0x8b
> > 3f: 6b .byte 0x6b
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 48 8b 00 mov (%rax),%rax
> > 3: 48 8b 68 50 mov 0x50(%rax),%rbp
> > 7: 48 85 ed test %rbp,%rbp
> > a: 7e 66 jle 0x72
> > c: 48 8b 03 mov (%rbx),%rax
> > f: a8 40 test $0x40,%al
> > 11: 74 7c je 0x8f
> > 13: 4c rex.WR
> > 14: 8b .byte 0x8b
> > 15: 6b .byte 0x6b
> >
> >
> > Snip out the essential instructions, which shows we just have read an
> > address at an offset of 0x18, then that must have been NULL and we
> > tried to dereference NULL which led to the BUG:
> > 9: 48 89 fb mov %rdi,%rbx
> > ...
> > 26: 48 8b 43 18 mov 0x18(%rbx),%rax
> > 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
> >
> > So the question is what is at offset 0x18 in the above code in
> > nfs_folio_length()?
> > Now folio_file_mapping() returns an address_space pointer ,and
> > address_space->host is at offset 0.
> > So if 'mapping' is at offset 0x18 of folio, then mapping is NULL. And
> > from the code, this seems to fit.
> >
> > 310 struct folio {
> > 311 /* private: don't document the anon union */
> > 312 union {
> > 313 struct {
> > 314 /* public: */
> > 315 unsigned long flags;
> > 316 union {
> > 317 struct list_head lru;
> > 318 /* private: avoid cluttering the output */
> > 319 struct {
> > 320 void *__filler;
> > 321 /* public: */
> > 322 unsigned int mlock_count;
> > 323 /* private: */
> > 324 };
> > 325 /* public: */
> > 326 };
> > 327 struct address_space *mapping;
> > 328 pgoff_t index;
> >
> >
> > 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
> > 446 {
> > 447 if (unlikely(folio_test_swapcache(folio)))
> > 448 return swapcache_mapping(folio);
> > 449
> > 450 return folio->mapping;
> > 451 }
> >
> > So it seems folio->mapping == NULL and that is why we BUG.
> >
> >
> >> I'm pinging you first about this because you changed the
> >> code around the trace_nfs_aop_read_done() call site in
> >> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
> >> use netfs when fscache is enabled").
> >>
> >
> > If I have the above analysis right, then maybe this patch would fix it?
> > Also I wonder if nfs_page_length() might have the same issue?
> >
> > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > index 9f0f4534744b..832f07994b88 100644
> > --- a/fs/nfs/internal.h
> > +++ b/fs/nfs/internal.h
> > @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
> > */
> > static inline size_t nfs_folio_length(struct folio *folio)
> > {
> > - loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
> > + loff_t i_size;
> > +
> > + if (!folio->mapping)
> > + return 0;
> > +
> > + i_size = i_size_read(folio_file_mapping(folio)->host);
> >
> > if (i_size > 0) {
> > pgoff_t index = folio_index(folio) >> folio_order(folio);
>
> Execution passed trace_nfs_aop_readpage() without incident,
> so folio->mapping was a valid pointer as we entered
> nfs_read_folio().
>
> What I don't understand is whether you expect folio->mapping
> to become NULL while nfs_read_folio() is running.
>
Was your test running both reads and truncates on the same file in parallel?
Once the read completes I believe the page is unlocked. Then if
truncates are running in parallel, then I think this can happen
(folio->mapping = NULL as a result of truncate).
>
> --
> Chuck Lever
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-02 21:33 ` David Wysochanski
@ 2024-07-03 5:08 ` Christoph Hellwig
2024-07-05 13:45 ` Chuck Lever III
2024-07-03 13:40 ` Chuck Lever III
1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-03 5:08 UTC (permalink / raw)
To: David Wysochanski
Cc: Chuck Lever III, Dave Wysochanski, Linux NFS Mailing List
On Tue, Jul 02, 2024 at 05:33:58PM -0400, David Wysochanski wrote:
[annoying full quote snipped]
> Was your test running both reads and truncates on the same file in parallel?
>
> Once the read completes I believe the page is unlocked. Then if
> truncates are running in parallel, then I think this can happen
> (folio->mapping = NULL as a result of truncate).
Yes, folio->mapping is only stable when the folio is locked.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-02 21:33 ` David Wysochanski
2024-07-03 5:08 ` Christoph Hellwig
@ 2024-07-03 13:40 ` Chuck Lever III
2024-07-03 14:18 ` Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-07-03 13:40 UTC (permalink / raw)
To: David Wysochanski; +Cc: Dave Wysochanski, Linux NFS Mailing List
> On Jul 2, 2024, at 5:33 PM, David Wysochanski <dwysocha@redhat.com> wrote:
>
> On Tue, Jul 2, 2024 at 5:17 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>>
>>
>>> On Jul 2, 2024, at 4:59 PM, David Wysochanski <dwysocha@redhat.com> wrote:
>>>
>>> Sorry for the late reply, more below.
>>>
>>> On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>
>>>> Hi Dave-
>>>>
>>>> I'm testing pNFS SCSI layouts with tracing enabled on the
>>>> client, and I hit this BUG twice today:
>>>>
>>>> BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>> #PF: supervisor read access in kernel mode
>>>> #PF: error_code(0x0000) - not-present page
>>>> PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>>>> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>>>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>>>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>>>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>>>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>>>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>>>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>>>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>>>> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>>>> Call Trace:
>>>> <TASK>
>>>> ? __die_body.cold+0x19/0x27
>>>> ? page_fault_oops+0xca/0x2a0
>>>> ? search_module_extables+0x19/0x60
>>>> ? search_bpf_extables+0x5f/0x80
>>>> ? exc_page_fault+0x7e/0x180
>>>> ? asm_exc_page_fault+0x26/0x30
>>>> ? nfs_folio_length+0x29/0x170 [nfs]
>>>> trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
>>>> nfs_read_folio+0x16d/0x260 [nfs]
>>>> ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
>>>> filemap_read_folio+0x43/0xe0
>>>> filemap_fault+0x70d/0xd00
>>>> __do_fault+0x35/0x120
>>>> do_fault+0xbb/0x470
>>>> __handle_mm_fault+0x7e9/0x1060
>>>> ? mt_find+0x21c/0x570
>>>> handle_mm_fault+0xf0/0x300
>>>> do_user_addr_fault+0x217/0x620
>>>> exc_page_fault+0x7e/0x180
>>>> asm_exc_page_fault+0x26/0x30
>>>> RIP: 0033:0x7f1d25ecedb7
>>>> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
>>>> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
>>>> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
>>>> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
>>>> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
>>>> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
>>>> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
>>>> </TASK>
>>>> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
>>>> CR2: 0000000000000000
>>>> ---[ end trace 0000000000000000 ]---
>>>> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
>>>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>>>> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
>>>> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
>>>> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
>>>> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
>>>> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
>>>> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
>>>> FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>>>>
>>>> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
>>>> nfs_folio_length+0x29/0x170:
>>>> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
>>>>
>>>> Which in my version of the source code is:
>>>>
>>>> 822 static inline size_t nfs_folio_length(struct folio *folio)
>>>> 823 {
>>>> 824 loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>>>>
>>>
>>> Ok so looks like we need to go deeper to know what the NULL pointer
>>> represented in the BUG statement.
>>> Decoding the oops:
>>> $ ~/git/kernel/scripts/decodecode < ./oops.txt
>>> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
>>> 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
>>> 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
>>> All code
>>> ========
>>> 0: 90 nop
>>> 1: 41 56 push %r14
>>> 3: 41 55 push %r13
>>> 5: 41 54 push %r12
>>> 7: 55 push %rbp
>>> 8: 53 push %rbx
>>> 9: 48 89 fb mov %rdi,%rbx
>>> c: 48 83 ec 08 sub $0x8,%rsp
>>> 10: 48 8b 07 mov (%rdi),%rax
>>> 13: a9 00 00 04 00 test $0x40000,%eax
>>> 18: 74 0c je 0x26
>>> 1a: 48 8b 07 mov (%rdi),%rax
>>> 1d: f6 c4 08 test $0x8,%ah
>>> 20: 0f 85 89 00 00 00 jne 0xaf
>>> 26: 48 8b 43 18 mov 0x18(%rbx),%rax
>>> 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
>>> 2d: 48 8b 68 50 mov 0x50(%rax),%rbp
>>> 31: 48 85 ed test %rbp,%rbp
>>> 34: 7e 66 jle 0x9c
>>> 36: 48 8b 03 mov (%rbx),%rax
>>> 39: a8 40 test $0x40,%al
>>> 3b: 74 7c je 0xb9
>>> 3d: 4c rex.WR
>>> 3e: 8b .byte 0x8b
>>> 3f: 6b .byte 0x6b
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>> 0: 48 8b 00 mov (%rax),%rax
>>> 3: 48 8b 68 50 mov 0x50(%rax),%rbp
>>> 7: 48 85 ed test %rbp,%rbp
>>> a: 7e 66 jle 0x72
>>> c: 48 8b 03 mov (%rbx),%rax
>>> f: a8 40 test $0x40,%al
>>> 11: 74 7c je 0x8f
>>> 13: 4c rex.WR
>>> 14: 8b .byte 0x8b
>>> 15: 6b .byte 0x6b
>>>
>>>
>>> Snip out the essential instructions, which shows we just have read an
>>> address at an offset of 0x18, then that must have been NULL and we
>>> tried to dereference NULL which led to the BUG:
>>> 9: 48 89 fb mov %rdi,%rbx
>>> ...
>>> 26: 48 8b 43 18 mov 0x18(%rbx),%rax
>>> 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
>>>
>>> So the question is what is at offset 0x18 in the above code in
>>> nfs_folio_length()?
>>> Now folio_file_mapping() returns an address_space pointer ,and
>>> address_space->host is at offset 0.
>>> So if 'mapping' is at offset 0x18 of folio, then mapping is NULL. And
>>> from the code, this seems to fit.
>>>
>>> 310 struct folio {
>>> 311 /* private: don't document the anon union */
>>> 312 union {
>>> 313 struct {
>>> 314 /* public: */
>>> 315 unsigned long flags;
>>> 316 union {
>>> 317 struct list_head lru;
>>> 318 /* private: avoid cluttering the output */
>>> 319 struct {
>>> 320 void *__filler;
>>> 321 /* public: */
>>> 322 unsigned int mlock_count;
>>> 323 /* private: */
>>> 324 };
>>> 325 /* public: */
>>> 326 };
>>> 327 struct address_space *mapping;
>>> 328 pgoff_t index;
>>>
>>>
>>> 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
>>> 446 {
>>> 447 if (unlikely(folio_test_swapcache(folio)))
>>> 448 return swapcache_mapping(folio);
>>> 449
>>> 450 return folio->mapping;
>>> 451 }
>>>
>>> So it seems folio->mapping == NULL and that is why we BUG.
>>>
>>>
>>>> I'm pinging you first about this because you changed the
>>>> code around the trace_nfs_aop_read_done() call site in
>>>> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
>>>> use netfs when fscache is enabled").
>>>>
>>>
>>> If I have the above analysis right, then maybe this patch would fix it?
>>> Also I wonder if nfs_page_length() might have the same issue?
>>>
>>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>>> index 9f0f4534744b..832f07994b88 100644
>>> --- a/fs/nfs/internal.h
>>> +++ b/fs/nfs/internal.h
>>> @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
>>> */
>>> static inline size_t nfs_folio_length(struct folio *folio)
>>> {
>>> - loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>>> + loff_t i_size;
>>> +
>>> + if (!folio->mapping)
>>> + return 0;
>>> +
>>> + i_size = i_size_read(folio_file_mapping(folio)->host);
>>>
>>> if (i_size > 0) {
>>> pgoff_t index = folio_index(folio) >> folio_order(folio);
>>
>> Execution passed trace_nfs_aop_readpage() without incident,
>> so folio->mapping was a valid pointer as we entered
>> nfs_read_folio().
>>
>> What I don't understand is whether you expect folio->mapping
>> to become NULL while nfs_read_folio() is running.
>>
>
> Was your test running both reads and truncates on the same file in parallel?
I didn't record it in my notes, but IIRC I was running fstests.
I'm pretty sure one or more of the fsx-based tests will do this.
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-03 13:40 ` Chuck Lever III
@ 2024-07-03 14:18 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-03 14:18 UTC (permalink / raw)
To: Chuck Lever III
Cc: David Wysochanski, Dave Wysochanski, Linux NFS Mailing List
On Wed, Jul 03, 2024 at 01:40:32PM +0000, Chuck Lever III wrote:
> > Was your test running both reads and truncates on the same file in parallel?
>
> I didn't record it in my notes, but IIRC I was running fstests.
> I'm pretty sure one or more of the fsx-based tests will do this.
Note that you don't have to truncate as in the truncate system call or
anything changing the file system. Simply invalidating the page cache
can lead to that, and xfstests exercises that heavily in the tests that
mix buffered and direct I/O.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-03 5:08 ` Christoph Hellwig
@ 2024-07-05 13:45 ` Chuck Lever III
2024-07-05 13:47 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2024-07-05 13:45 UTC (permalink / raw)
To: David Wysochanski, Dave Wysochanski
Cc: Linux NFS Mailing List, Christoph Hellwig
> On Jul 3, 2024, at 1:08 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jul 02, 2024 at 05:33:58PM -0400, David Wysochanski wrote:
>
> [annoying full quote snipped]
>
>> Was your test running both reads and truncates on the same file in parallel?
>>
>> Once the read completes I believe the page is unlocked. Then if
>> truncates are running in parallel, then I think this can happen
>> (folio->mapping = NULL as a result of truncate).
>
> Yes, folio->mapping is only stable when the folio is locked.
So Dave, I haven't tested the patch you posted a couple
days ago, because there hasn't been a clear answer about
whether nfs_read_folio() needs to protect itself against
the ->mapping changing, in which case, that's probably
a better fix.
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-05 13:45 ` Chuck Lever III
@ 2024-07-05 13:47 ` Christoph Hellwig
2024-07-09 15:45 ` David Wysochanski
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-05 13:47 UTC (permalink / raw)
To: Chuck Lever III
Cc: David Wysochanski, Dave Wysochanski, Linux NFS Mailing List,
Christoph Hellwig
On Fri, Jul 05, 2024 at 01:45:11PM +0000, Chuck Lever III wrote:
> So Dave, I haven't tested the patch you posted a couple
> days ago, because there hasn't been a clear answer about
> whether nfs_read_folio() needs to protect itself against
> the ->mapping changing, in which case, that's probably
> a better fix.
->read_folio is called with the folio locked and only unlocks it
on I/O completion, so it doesn't really need any protection. So the
patch to simply move the trace point to before unlocking the folio
should fix the issue.
Alternatively we could just use the mapping from the inode variable
and pass it in.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-05 13:47 ` Christoph Hellwig
@ 2024-07-09 15:45 ` David Wysochanski
2024-07-10 6:33 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: David Wysochanski @ 2024-07-09 15:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chuck Lever III, Dave Wysochanski, Linux NFS Mailing List
On Fri, Jul 5, 2024 at 9:48 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Jul 05, 2024 at 01:45:11PM +0000, Chuck Lever III wrote:
> > So Dave, I haven't tested the patch you posted a couple
> > days ago, because there hasn't been a clear answer about
> > whether nfs_read_folio() needs to protect itself against
> > the ->mapping changing, in which case, that's probably
> > a better fix.
>
> ->read_folio is called with the folio locked and only unlocks it
> on I/O completion, so it doesn't really need any protection. So the
> patch to simply move the trace point to before unlocking the folio
> should fix the issue.
>
I didn't see this so maybe you sent it privately or I missed a message.
> Alternatively we could just use the mapping from the inode variable
> and pass it in.
>
I'm not sure I follow - are you suggesting fixing the tracepoint?
Regardless of possible tracepoint movement or other fixes,
nfs_folio_length() needs to be patched because it should
handle folio->mapping == NULL. Ditto for nfs_page_length().
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG in nfs_read_folio() when tracing is enabled
2024-07-09 15:45 ` David Wysochanski
@ 2024-07-10 6:33 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-07-10 6:33 UTC (permalink / raw)
To: David Wysochanski
Cc: Christoph Hellwig, Chuck Lever III, Dave Wysochanski,
Linux NFS Mailing List
On Tue, Jul 09, 2024 at 11:45:11AM -0400, David Wysochanski wrote:
> > > So Dave, I haven't tested the patch you posted a couple
> > > days ago, because there hasn't been a clear answer about
> > > whether nfs_read_folio() needs to protect itself against
> > > the ->mapping changing, in which case, that's probably
> > > a better fix.
> >
> > ->read_folio is called with the folio locked and only unlocks it
> > on I/O completion, so it doesn't really need any protection. So the
> > patch to simply move the trace point to before unlocking the folio
> > should fix the issue.
> >
> I didn't see this so maybe you sent it privately or I missed a message.
>
> > Alternatively we could just use the mapping from the inode variable
> > and pass it in.
> >
> I'm not sure I follow - are you suggesting fixing the tracepoint?
I'm suggesting to fix the tracepoint folio. I though I had something
out, but I guess it never made it.
> Regardless of possible tracepoint movement or other fixes,
> nfs_folio_length() needs to be patched because it should
> handle folio->mapping == NULL. Ditto for nfs_page_length().
nfs_page_length is completely unused and I've already sent a patch
to remove it. nfs_folio_length is only sanely usable with the
folio either locked or under writeback. So adding this random
check there is probably not a good idea, but an audit of the callers
would be very useful.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-07-10 6:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-19 20:45 BUG in nfs_read_folio() when tracing is enabled Chuck Lever III
2024-07-02 20:59 ` David Wysochanski
2024-07-02 21:14 ` David Wysochanski
2024-07-02 21:17 ` Chuck Lever III
2024-07-02 21:33 ` David Wysochanski
2024-07-03 5:08 ` Christoph Hellwig
2024-07-05 13:45 ` Chuck Lever III
2024-07-05 13:47 ` Christoph Hellwig
2024-07-09 15:45 ` David Wysochanski
2024-07-10 6:33 ` Christoph Hellwig
2024-07-03 13:40 ` Chuck Lever III
2024-07-03 14:18 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox