* NFS dio aio bug @ 2011-01-14 1:29 Nick Piggin 2011-01-14 16:08 ` Chuck Lever 0 siblings, 1 reply; 5+ messages in thread From: Nick Piggin @ 2011-01-14 1:29 UTC (permalink / raw) To: Trond Myklebust, linux-fsdevel Hi Trond, I'm getting use after frees in aio code in NFS [ 2703.396766] Call Trace: [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b Adding some tracing, it is due to nfs completing the request then returning something other than -EIOCBQUEUED, so aio.c also completes the request. Thanks, Nick ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: NFS dio aio bug 2011-01-14 1:29 NFS dio aio bug Nick Piggin @ 2011-01-14 16:08 ` Chuck Lever [not found] ` <4E70BD9B-DB23-42EC-B28D-998E16FEC189-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Chuck Lever @ 2011-01-14 16:08 UTC (permalink / raw) To: Nick Piggin; +Cc: Trond Myklebust, linux-fsdevel, Linux NFS Mailing List Hi Nick- On Jan 13, 2011, at 8:29 PM, Nick Piggin wrote: > Hi Trond, > > I'm getting use after frees in aio code in NFS Can you describe how to reproduce this? > [ 2703.396766] Call Trace: > [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 > [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 > [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 > [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 > [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 > [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 > [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 > [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 > [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 > [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b > > Adding some tracing, it is due to nfs completing the request then > returning something other than -EIOCBQUEUED, so aio.c > also completes the request. Is this with reads, writes, or both? Are the I/O requests smaller than, equal to, or larger than rsize or wsize? We have a related bug report: hitting the BUG at fs/aio.c:552 (OEL5) and similar for more recent kernels. Looks like dreq refcounting is faulty somehow. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <4E70BD9B-DB23-42EC-B28D-998E16FEC189-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: NFS dio aio bug [not found] ` <4E70BD9B-DB23-42EC-B28D-998E16FEC189-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2011-01-14 21:48 ` Nick Piggin [not found] ` <AANLkTikE8CT=6XQn550Cw6V0B7GWezpeUrhctFN8vSE7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Nick Piggin @ 2011-01-14 21:48 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, linux-fsdevel, Linux NFS Mailing List On Sat, Jan 15, 2011 at 3:08 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: > Hi Nick- > > On Jan 13, 2011, at 8:29 PM, Nick Piggin wrote: > >> Hi Trond, >> >> I'm getting use after frees in aio code in NFS > > Can you describe how to reproduce this? It was with the aio-dio stress code from xfstests, #207 I think or 208. Running it for a short time and then ^C ing it would tend to trigger it. I'll have to get you more details after I come back from travelling in a week's time. > >> [ 2703.396766] Call Trace: >> [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 >> [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 >> [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 >> [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 >> [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 >> [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 >> [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 >> [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 >> [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 >> [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 >> [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 >> [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b >> >> Adding some tracing, it is due to nfs completing the request then >> returning something other than -EIOCBQUEUED, so aio.c >> also completes the request. > > Is this with reads, writes, or both? Are the I/O requests smaller than, equal to, or larger than rsize or wsize? > > We have a related bug report: hitting the BUG at fs/aio.c:552 (OEL5) and similar for more recent kernels. Looks like dreq refcounting is faulty somehow. I only saw it with writes. The request was being completed in nfs direct write path when I added some tracing. It was very easy to reproduce, I just didn't have time to bisect it, but I can do that when I get back if you don't have it solved by then. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <AANLkTikE8CT=6XQn550Cw6V0B7GWezpeUrhctFN8vSE7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: NFS dio aio bug [not found] ` <AANLkTikE8CT=6XQn550Cw6V0B7GWezpeUrhctFN8vSE7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-01-18 22:01 ` Chuck Lever [not found] ` <D4A3D0CB-DA82-4E2E-BCAD-B03108AA4284-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Chuck Lever @ 2011-01-18 22:01 UTC (permalink / raw) To: Nick Piggin; +Cc: Trond Myklebust, linux-fsdevel, Linux NFS Mailing List On Jan 14, 2011, at 4:48 PM, Nick Piggin wrote: > On Sat, Jan 15, 2011 at 3:08 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: >> Hi Nick- >> >> On Jan 13, 2011, at 8:29 PM, Nick Piggin wrote: >> >>> Hi Trond, >>> >>> I'm getting use after frees in aio code in NFS >> >> Can you describe how to reproduce this? > > It was with the aio-dio stress code from xfstests, #207 I think > or 208. > > Running it for a short time and then ^C ing it would tend to > trigger it. > > I'll have to get you more details after I come back from > travelling in a week's time. I can reproduce this fairly reliably here with test 208. ^C makes it happen almost immediately, but it can also trigger sometimes just by running the test. > >> >>> [ 2703.396766] Call Trace: >>> [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 >>> [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 >>> [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 >>> [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 >>> [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 >>> [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 >>> [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 >>> [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 >>> [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 >>> [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 >>> [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 >>> [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b >>> >>> Adding some tracing, it is due to nfs completing the request then >>> returning something other than -EIOCBQUEUED, so aio.c >>> also completes the request. >> >> Is this with reads, writes, or both? Are the I/O requests smaller than, equal to, or larger than rsize or wsize? >> >> We have a related bug report: hitting the BUG at fs/aio.c:552 (OEL5) and similar for more recent kernels. Looks like dreq refcounting is faulty somehow. > > I only saw it with writes. The request was being completed > in nfs direct write path when I added some tracing. It was > very easy to reproduce, I just didn't have time to bisect it, > but I can do that when I get back if you don't have it solved > by then. I'm looking at it now. This looks exactly like the symptoms of our bug. Thanks for the report and the reproducer. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <D4A3D0CB-DA82-4E2E-BCAD-B03108AA4284-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: NFS dio aio bug [not found] ` <D4A3D0CB-DA82-4E2E-BCAD-B03108AA4284-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2011-01-19 2:56 ` Wengang Wang 0 siblings, 0 replies; 5+ messages in thread From: Wengang Wang @ 2011-01-19 2:56 UTC (permalink / raw) To: Chuck Lever Cc: Nick Piggin, Trond Myklebust, linux-fsdevel, Linux NFS Mailing List On 11-01-18 17:01, Chuck Lever wrote: > > On Jan 14, 2011, at 4:48 PM, Nick Piggin wrote: > > > On Sat, Jan 15, 2011 at 3:08 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: > >> Hi Nick- > >> > >> On Jan 13, 2011, at 8:29 PM, Nick Piggin wrote: > >> > >>> Hi Trond, > >>> > >>> I'm getting use after frees in aio code in NFS > >> > >> Can you describe how to reproduce this? > > > > It was with the aio-dio stress code from xfstests, #207 I think > > or 208. > > > > Running it for a short time and then ^C ing it would tend to > > trigger it. > > > > I'll have to get you more details after I come back from > > travelling in a week's time. > > I can reproduce this fairly reliably here with test 208. ^C makes it happen almost immediately, but it can also trigger sometimes just by running the test. > > > > >> > >>> [ 2703.396766] Call Trace: > >>> [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 > >>> [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 > >>> [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 > >>> [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 > >>> [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > >>> [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 > >>> [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 > >>> [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > >>> [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 > >>> [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 > >>> [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 > >>> [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b > >>> > >>> Adding some tracing, it is due to nfs completing the request then > >>> returning something other than -EIOCBQUEUED, so aio.c > >>> also completes the request. > >> > >> Is this with reads, writes, or both? Are the I/O requests smaller than, equal to, or larger than rsize or wsize? > >> > >> We have a related bug report: hitting the BUG at fs/aio.c:552 (OEL5) and similar for more recent kernels. Looks like dreq refcounting is faulty somehow. > > > > I only saw it with writes. The request was being completed > > in nfs direct write path when I added some tracing. It was > > very easy to reproduce, I just didn't have time to bisect it, > > but I can do that when I get back if you don't have it solved > > by then. > > I'm looking at it now. This looks exactly like the symptoms of our bug. Thanks for the report and the reproducer. Yes, there is a fs/aio.c:552 bug filed. I can reproduce it as will and it happens with both read and write. Currently, I have this anylysis(correct me if I am wrong): In the sys_io_submit path, the vfs doesn't hope the FS under ground drop the ki_users in "error" case. Actually it is -ERESTARTSYS, in my test, instead of -EIOCBQUEUED. But seems the nfs drops ki_users in error case as well as successful cases too. I now have a v1 patch to "fix" it. I will post it later after more thinking about it. thanks, wengang. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-01-19 2:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-14 1:29 NFS dio aio bug Nick Piggin 2011-01-14 16:08 ` Chuck Lever [not found] ` <4E70BD9B-DB23-42EC-B28D-998E16FEC189-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2011-01-14 21:48 ` Nick Piggin [not found] ` <AANLkTikE8CT=6XQn550Cw6V0B7GWezpeUrhctFN8vSE7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-01-18 22:01 ` Chuck Lever [not found] ` <D4A3D0CB-DA82-4E2E-BCAD-B03108AA4284-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2011-01-19 2:56 ` Wengang Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).