Re: [Bug report] xfstests generic/323 over NFS hit BUG: KASAN: slab-use-after-free in nfs_local_call_read on 6.18.0-rc1

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mike Snitzer <snitzer@kernel.org>
To: Trond Myklebust <trondmy@kernel.org>
Cc: Yongcheng Yang <yoyang@redhat.com>, linux-nfs@vger.kernel.org
Subject: Re: [Bug report] xfstests generic/323 over NFS hit BUG: KASAN: slab-use-after-free in nfs_local_call_read on 6.18.0-rc1
Date: Sun, 19 Oct 2025 12:26:25 -0400	[thread overview]
Message-ID: <aPURMSaH1rXQJkdj@kernel.org> (raw)
In-Reply-To: <ed50690c04699c820873642ea38a01eec53d21af.camel@kernel.org>

On Sun, Oct 19, 2025 at 11:18:57AM -0400, Trond Myklebust wrote:
> On Sun, 2025-10-19 at 17:29 +0800, Yongcheng Yang wrote:
> > Hi All,
> > 
> > There is a new nfs slab-use-after-free issue since 6.18.0-rc1.
> > It appears to be reliably reproducible on my side when running
> > xfstests
> > generic/323 over NFSv4.2 in *debug* kernel mode:
> 
> Thanks for the report! I think I see the problem.
> 
> Mike,
> 
> When you iterate over the iocb in nfs_local_call_read(), you're calling
> nfs_local_pgio_done(), nfs_local_read_done() and
> nfs_local_pgio_release() multiple times.

I purposely made nfs_local_pgio_done() safe to call multiple times.

And nfs_local_{read,write}_done() and nfs_local_pgio_release()
_should_ only be called once.

>  * You're calling nfs_local_read_aio_complete() and
>    nfs_local_read_aio_complete_work() once for each and every
>    asynchronous call.

There is only the possibility of a single async call for the single
aligned DIO.

For any given pgio entering LOCALIO, it may be split into 3 pieces:
The misaligned head and tail are first handled sync and only then the
aligned middle async (or possibly sync if underlying device imposes
sync, e.g. ramdisk).

>  * You're calling nfs_local_pgio_done() for each synchronous call.

Yes, which is safe.  It just updates status, deals with partial
completion.

>  * In addition, if there is a synchronous call at the very end of the
>    iteration, so that status != -EIOCBQUEUED, then you're also calling
>    nfs_local_read_done() one extra time, and then calling
>    nfs_local_pgio_release().

It isn't in addition, its only for the last piece of IO (be it sync or
async).

> The same thing appears to be happening in nfs_local_call_write().

I fully acknolwdge this isn't an easy audit.  And there could be
something wrong.  But I'm not seeing it.  Obviously this BUG report
puts onus on me to figure it out...

BUT, I have used this code extensively on non-debug and had no issues.
Is it at all possible KASAN is triggering a false-positive!?

Mike

> > 
> > [18265.311177]
> > ==================================================================
> > [18265.315831] BUG: KASAN: slab-use-after-free in
> > nfs_local_call_read+0x590/0x7f0 [nfs]
> > [18265.320135] Read of size 2 at addr ffff8881090556a2 by task
> > kworker/u9:0/667366
> > 
> > [18265.325454] CPU: 0 UID: 0 PID: 667366 Comm: kworker/u9:0 Not
> > tainted 6.18.0-rc1 #1 PREEMPT(full) 
> > [18265.325461] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20241117-
> > 2.el9 11/17/2024
> > [18265.325465] Workqueue: nfslocaliod nfs_local_call_read [nfs]
> > [18265.325611] Call Trace:
> > [18265.325615]  <TASK>
> > [18265.325619]  dump_stack_lvl+0x77/0xa0
> > [18265.325629]  print_report+0x171/0x820
> > [18265.325637]  ? __virt_addr_valid+0x151/0x3a0
> > [18265.325644]  ? __virt_addr_valid+0x300/0x3a0
> > [18265.325650]  ? nfs_local_call_read+0x590/0x7f0 [nfs]
> > [18265.325770]  kasan_report+0x167/0x1a0
> > [18265.325777]  ? nfs_local_call_read+0x590/0x7f0 [nfs]
> > [18265.325900]  nfs_local_call_read+0x590/0x7f0 [nfs]
> > [18265.326027]  ? process_scheduled_works+0x7d3/0x11d0
> > [18265.326034]  process_scheduled_works+0x857/0x11d0
> > [18265.326050]  worker_thread+0x897/0xd00
> > [18265.326065]  kthread+0x51b/0x650
> > [18265.326071]  ? __pfx_worker_thread+0x10/0x10
> > [18265.326076]  ? __pfx_kthread+0x10/0x10
> > [18265.326082]  ret_from_fork+0x249/0x480
> > [18265.326087]  ? __pfx_kthread+0x10/0x10
> > [18265.326092]  ret_from_fork_asm+0x1a/0x30
> > [18265.326104]  </TASK>
> > 
> > [18265.378345] Allocated by task 681242:
> > [18265.380068]  kasan_save_track+0x3e/0x80
> > [18265.381838]  __kasan_kmalloc+0x93/0xb0
> > [18265.383587]  __kmalloc_cache_noprof+0x3eb/0x6e0
> > [18265.385532]  nfs_local_doio+0x1cb/0xeb0 [nfs]
> > [18265.387630]  nfs_initiate_pgio+0x284/0x400 [nfs]
> > [18265.389815]  nfs_generic_pg_pgios+0x6e2/0x810 [nfs]
> > [18265.391998]  nfs_pageio_complete+0x278/0x750 [nfs]
> > [18265.394146]  nfs_file_direct_read+0x78c/0x9e0 [nfs]
> > [18265.396386]  vfs_read+0x5d0/0x770
> > [18265.398043]  __x64_sys_pread64+0xed/0x160
> > [18265.399837]  do_syscall_64+0xad/0x7d0
> > [18265.401561]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > [18265.404687] Freed by task 596986:
> > [18265.406245]  kasan_save_track+0x3e/0x80
> > [18265.408013]  __kasan_save_free_info+0x46/0x50
> > [18265.409884]  __kasan_slab_free+0x58/0x80
> > [18265.411613]  kfree+0x1c1/0x620
> > [18265.413075]  nfs_local_read_aio_complete_work+0x86/0x100 [nfs]
> > [18265.415486]  process_scheduled_works+0x857/0x11d0
> > [18265.417437]  worker_thread+0x897/0xd00
> > [18265.419177]  kthread+0x51b/0x650
> > [18265.420689]  ret_from_fork+0x249/0x480
> > [18265.422331]  ret_from_fork_asm+0x1a/0x30
> > 
> > [18265.424989] Last potentially related work creation:
> > [18265.426949]  kasan_save_stack+0x3e/0x60
> > [18265.428639]  kasan_record_aux_stack+0xbd/0xd0
> > [18265.430423]  insert_work+0x2d/0x230
> > [18265.431968]  __queue_work+0x8ec/0xb50
> > [18265.433555]  queue_work_on+0xaf/0xe0
> > [18265.435126]  iomap_dio_bio_end_io+0xb5/0x160
> > [18265.436902]  blk_update_request+0x3d1/0x1000
> > [18265.438699]  blk_mq_end_request+0x3c/0x70
> > [18265.440379]  virtblk_done+0x148/0x250
> > [18265.441973]  vring_interrupt+0x159/0x300
> > [18265.443642]  __handle_irq_event_percpu+0x1c3/0x700
> > [18265.445556]  handle_irq_event+0x8b/0x1c0
> > [18265.447219]  handle_edge_irq+0x1b5/0x760
> > [18265.448881]  __common_interrupt+0xba/0x140
> > [18265.450588]  common_interrupt+0x45/0xa0
> > [18265.452258]  asm_common_interrupt+0x26/0x40
> > 
> > [18265.454941] Second to last potentially related work creation:
> > [18265.457141]  kasan_save_stack+0x3e/0x60
> > [18265.458790]  kasan_record_aux_stack+0xbd/0xd0
> > [18265.460597]  insert_work+0x2d/0x230
> > [18265.462129]  __queue_work+0x8ec/0xb50
> > [18265.463725]  queue_work_on+0xaf/0xe0
> > [18265.465289]  nfs_local_doio+0xa75/0xeb0 [nfs]
> > [18265.467220]  nfs_initiate_pgio+0x284/0x400 [nfs]
> > [18265.469226]  nfs_generic_pg_pgios+0x6e2/0x810 [nfs]
> > [18265.471310]  nfs_pageio_complete+0x278/0x750 [nfs]
> > [18265.473363]  nfs_file_direct_read+0x78c/0x9e0 [nfs]
> > [18265.475432]  vfs_read+0x5d0/0x770
> > [18265.476941]  __x64_sys_pread64+0xed/0x160
> > [18265.478648]  do_syscall_64+0xad/0x7d0
> > [18265.480240]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > [18265.483211] The buggy address belongs to the object at
> > ffff888109055600
> >                 which belongs to the cache kmalloc-rnd-14-512 of size
> > 512
> > [18265.488048] The buggy address is located 162 bytes inside of
> >                 freed 512-byte region [ffff888109055600,
> > ffff888109055800)
> > 
> > [18265.493827] The buggy address belongs to the physical page:
> > [18265.496033] page: refcount:0 mapcount:0 mapping:0000000000000000
> > index:0xffff888109050e00 pfn:0x109050
> > [18265.499353] head: order:3 mapcount:0 entire_mapcount:0
> > nr_pages_mapped:0 pincount:0
> > [18265.502198] flags:
> > 0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
> > [18265.505105] page_type: f5(slab)
> > [18265.506675] raw: 0017ffffc0000240 ffff88810006d540
> > ffffea0004151c10 ffff88810006e088
> > [18265.509537] raw: ffff888109050e00 000000000015000f
> > 00000000f5000000 0000000000000000
> > [18265.512418] head: 0017ffffc0000240 ffff88810006d540
> > ffffea0004151c10 ffff88810006e088
> > [18265.515326] head: ffff888109050e00 000000000015000f
> > 00000000f5000000 0000000000000000
> > [18265.518244] head: 0017ffffc0000003 ffffea0004241401
> > 00000000ffffffff 00000000ffffffff
> > [18265.521168] head: ffffffffffffffff 0000000000000000
> > 00000000ffffffff 0000000000000008
> > [18265.524113] page dumped because: kasan: bad access detected
> > 
> > [18265.527455] Memory state around the buggy address:
> > [18265.529505]  ffff888109055580: fc fc fc fc fc fc fc fc fc fc fc fc
> > fc fc fc fc
> > [18265.532292]  ffff888109055600: fa fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb
> > [18265.535149] >ffff888109055680: fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb
> > [18265.537930]                                ^
> > [18265.539899]  ffff888109055700: fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb
> > [18265.542713]  ffff888109055780: fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb
> > [18265.545507]
> > ==================================================================
> > [18265.554665] Disabling lock debugging due to kernel taint
> > 
> 
> -- 
> Trond Myklebust Linux NFS client maintainer, Hammerspace
> trondmy@kernel.org, trond.myklebust@hammerspace.com

next prev parent reply	other threads:[~2025-10-19 16:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-19  9:29 [Bug report] xfstests generic/323 over NFS hit BUG: KASAN: slab-use-after-free in nfs_local_call_read on 6.18.0-rc1 Yongcheng Yang
2025-10-19 15:18 ` Trond Myklebust
2025-10-19 16:26   ` Mike Snitzer [this message]
2025-10-20 18:24     ` Mike Snitzer
2025-10-27 13:08       ` [v6.18-rcX PATCH 0/3] nfs/localio: fixes for recent misaligned DIO changes Mike Snitzer
2025-10-27 13:08       ` [v6.18-rcX PATCH 1/3] nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support Mike Snitzer
2025-10-27 13:08       ` [v6.18-rcX PATCH 2/3] nfs/localio: add refcounting for each iocb IO associated with NFS pgio header Mike Snitzer
2025-10-27 13:19         ` Christoph Hellwig
2025-10-27 13:55           ` Mike Snitzer
2025-10-27 14:45             ` Christoph Hellwig
2025-10-27 13:08       ` [v6.18-rcX PATCH 3/3] nfs/localio: backfill missing partial read support for misaligned DIO Mike Snitzer
2025-10-27 17:52       ` [v6.18-rcX PATCH 4/3] nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion Mike Snitzer
2025-10-29 23:19         ` [v6.18-rcX PATCH 5/3] nfs/localio: do not issue misaligned DIO out-of-order Mike Snitzer
2025-10-31  1:50           ` Mike Snitzer
2025-10-31 13:33             ` Anna Schumaker
2025-11-04 18:02               ` [v6.18-rcX PATCH v2] " Mike Snitzer
2025-11-06  2:50                 ` Mike Snitzer
2025-11-06  3:03                   ` [v6.18-rcX PATCH v3 5/3] " Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPURMSaH1rXQJkdj@kernel.org \
    --to=snitzer@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@kernel.org \
    --cc=yoyang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).