* hanging aio process
@ 2014-05-19 17:38 Sebastian Ott
2014-05-19 18:08 ` Benjamin LaHaise
2014-05-19 18:31 ` hanging aio process Anatol Pomozov
0 siblings, 2 replies; 23+ messages in thread
From: Sebastian Ott @ 2014-05-19 17:38 UTC (permalink / raw)
To: Anatol Pomozov, Benjamin LaHaise; +Cc: linux-aio, linux-kernel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 395 bytes --]
Hello,
on the latest kernel a fio job with 4 workers using libaio hangs.
git bisect points to:
commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c
Author: Anatol Pomozov <anatol.pomozov@gmail.com>
Date: Tue Apr 15 11:31:33 2014 -0700
aio: block io_destroy() until all context requests are completed
The fio workers are on the wait_for_completion in sys_io_destroy.
Regards,
Sebastian
[-- Attachment #2: Type: TEXT/PLAIN, Size: 235 bytes --]
[global]
blocksize=4K
size=256M
rw=randrw
verify=md5
iodepth=32
ioengine=libaio
direct=1
end_fsync=1
[file1]
filename=/dev/scma
[file2]
filename=/dev/scmbw
[file3]
filename=/dev/scmc
[file4]
filename=/dev/scmx
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-19 17:38 hanging aio process Sebastian Ott
@ 2014-05-19 18:08 ` Benjamin LaHaise
2014-05-20 8:08 ` Sebastian Ott
2014-05-19 18:31 ` hanging aio process Anatol Pomozov
1 sibling, 1 reply; 23+ messages in thread
From: Benjamin LaHaise @ 2014-05-19 18:08 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Anatol Pomozov, linux-aio, linux-kernel
On Mon, May 19, 2014 at 07:38:51PM +0200, Sebastian Ott wrote:
> Hello,
>
> on the latest kernel a fio job with 4 workers using libaio hangs.
Is more than one process stuck in state D when the hang occurs? If so,
what does a backtrace show for the stuck processes (or are there any
hung process warnings issued)? It is entirely possible the bug isn't
caused by the referenced commit, as the commit you're pointing to merely
makes io_destroy() syscall wait for all aio outstanding to complete
before returning.
-ben
> git bisect points to:
> commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c
> Author: Anatol Pomozov <anatol.pomozov@gmail.com>
> Date: Tue Apr 15 11:31:33 2014 -0700
>
> aio: block io_destroy() until all context requests are completed
>
>
> The fio workers are on the wait_for_completion in sys_io_destroy.
>
> Regards,
> Sebastian
> [global]
> blocksize=4K
> size=256M
> rw=randrw
> verify=md5
> iodepth=32
> ioengine=libaio
> direct=1
> end_fsync=1
>
> [file1]
> filename=/dev/scma
>
> [file2]
> filename=/dev/scmbw
>
> [file3]
> filename=/dev/scmc
>
> [file4]
> filename=/dev/scmx
--
"Thought is the essence of where you are now."
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-19 17:38 hanging aio process Sebastian Ott
2014-05-19 18:08 ` Benjamin LaHaise
@ 2014-05-19 18:31 ` Anatol Pomozov
1 sibling, 0 replies; 23+ messages in thread
From: Anatol Pomozov @ 2014-05-19 18:31 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Benjamin LaHaise, linux-aio, LKML
Hi
On Mon, May 19, 2014 at 10:38 AM, Sebastian Ott
<sebott@linux.vnet.ibm.com> wrote:
> Hello,
>
> on the latest kernel a fio job with 4 workers using libaio hangs.
>
> git bisect points to:
> commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c
> Author: Anatol Pomozov <anatol.pomozov@gmail.com>
> Date: Tue Apr 15 11:31:33 2014 -0700
>
> aio: block io_destroy() until all context requests are completed
>
>
> The fio workers are on the wait_for_completion in sys_io_destroy.
I ran your script on my dev machine (kernel built from current HEAD
v3.15-rc5-77-g14186fe) and fio (versions 1.35 and 2.1.9) - it works
fine. I use only 1 partition as it is what I have on my dev machine.
dev:~# uname -a
Linux lpa12 3.15.0-smp-DEV #1 SMP Mon May 19 10:43:11 PDT 2014 x86_64 GNU/Linux
dev:~# ./fio fio.config
file1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.1.9
Starting 1 process
Jobs: 1 (f=1): [V] [97.7% done] [23148KB/0KB/0KB /s] [5787/0/0 iops]
[eta 00m:06s]
file1: (groupid=0, jobs=1): err= 0: pid=8886: Mon May 19 11:28:01 2014
read : io=262144KB, bw=1030.4KB/s, iops=257, runt=254422msec
slat (usec): min=2, max=6661.1K, avg=1683383.24, stdev=2043757.55
clat (usec): min=502, max=1790.7K, avg=101036.69, stdev=130084.58
lat (usec): min=516, max=6670.5K, avg=1784420.38, stdev=1967789.22
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 6], 20.00th=[ 6],
| 30.00th=[ 6], 40.00th=[ 6], 50.00th=[ 6], 60.00th=[ 96],
| 70.00th=[ 151], 80.00th=[ 206], 90.00th=[ 285], 95.00th=[ 359],
| 99.00th=[ 523], 99.50th=[ 586], 99.90th=[ 758], 99.95th=[ 840],
| 99.99th=[ 1090]
bw (KB /s): min= 354, max= 738, per=51.26%, avg=527.94, stdev=56.33
write: io=130924KB, bw=538931B/s, iops=131, runt=248763msec
slat (usec): min=3, max=211966, avg=111.03, stdev=2891.12
clat (usec): min=24, max=278755, avg=46127.52, stdev=40482.33
lat (usec): min=432, max=278772, avg=46238.99, stdev=40477.33
clat percentiles (usec):
| 1.00th=[ 636], 5.00th=[ 1032], 10.00th=[ 3184], 20.00th=[ 9920],
| 30.00th=[18560], 40.00th=[28544], 50.00th=[38656], 60.00th=[48896],
| 70.00th=[59648], 80.00th=[73216], 90.00th=[96768], 95.00th=[128512],
| 99.00th=[183296], 99.50th=[201728], 99.90th=[236544], 99.95th=[250880],
| 99.99th=[276480]
bw (KB /s): min= 262, max= 788, per=100.00%, avg=527.29, stdev=86.42
lat (usec) : 50=0.01%, 100=0.01%, 250=0.04%, 500=0.02%, 750=0.82%
lat (usec) : 1000=0.83%
lat (msec) : 2=0.97%, 4=1.72%, 10=36.38%, 20=4.35%, 50=11.45%
lat (msec) : 100=14.21%, 250=20.03%, 500=8.34%, 750=0.76%, 1000=0.07%
lat (msec) : 2000=0.01%
cpu : usr=0.71%, sys=0.65%, ctx=96595, majf=0, minf=775
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=65536/w=32731/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=262144KB, aggrb=1030KB/s, minb=1030KB/s, maxb=1030KB/s,
mint=254422msec, maxt=254422msec
WRITE: io=130924KB, aggrb=526KB/s, minb=526KB/s, maxb=526KB/s,
mint=248763msec, maxt=248763msec
Disk stats (read/write):
sdb: ios=65536/32734, merge=0/0, ticks=6599230/1502876,
in_queue=8102038, util=100.00%
+1 what Benjamin said, provide kernel stack traces for stuck processes
(use sysrq for it). And add other relevant information how to repro
the issue.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-19 18:08 ` Benjamin LaHaise
@ 2014-05-20 8:08 ` Sebastian Ott
2014-05-20 13:16 ` Sebastian Ott
0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Ott @ 2014-05-20 8:08 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: Anatol Pomozov, linux-aio, linux-kernel
On Mon, 19 May 2014, Benjamin LaHaise wrote:
> On Mon, May 19, 2014 at 07:38:51PM +0200, Sebastian Ott wrote:
> > Hello,
> >
> > on the latest kernel a fio job with 4 workers using libaio hangs.
>
> Is more than one process stuck in state D when the hang occurs? If so,
> what does a backtrace show for the stuck processes (or are there any
> hung process warnings issued)?
I've seen both - just one or multiple processes in D state. Here it are 2:
./fio ../../test.job
file1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
file2: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
file3: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
file4: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.1.8
Starting 4 processes
Jobs: 2 (f=0): [m__m] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
[ 58.125227] fio D 00000000006a2dfe 0 804 800 0x00000001
[ 58.125229] 000000007b52fb88 0000000000ab3800 0000000000ab3800 00000000009e60b0
[ 58.125229] 0000000000ab3800 00000000789284d8 0000000002e0f800 0000000078928000
[ 58.125229] 00000000797aec00 0000000000ab3800 0000000000ab3800 00000000009e60b0
[ 58.125229] 0000000000ab3800 0000000000000000 0000000002e0f800 0000000078928000
[ 58.125229] 00000000006b39b0 00000000006a377a 000000007b52fbc8 000000007b52fd20
[ 58.125238] Call Trace:
[ 58.125240] ([<00000000006a377a>] __schedule+0x562/0xcc8)
[ 58.125241] [<00000000006a2dfe>] schedule_timeout+0x1ee/0x270
[ 58.125243] [<00000000006a4acc>] wait_for_common+0x100/0x1d0
[ 58.125246] [<00000000002ca3fe>] SyS_io_destroy+0x9a/0xdc
[ 58.125247] [<00000000006a80f8>] sysc_nr_ok+0x22/0x28
[ 58.125248] [<000003fffd21c6d2>] 0x3fffd21c6d2
[ 58.125250] fio D 00000000006a2dfe 0 811 800 0x00000001
[ 58.125252] 000000007d82bb88 0000000000ab3800 0000000000ab3800 00000000009e60b0
[ 58.125252] 0000000000ab3800 000000007b0869f8 0000000002e0f800 000000007b086520
[ 58.125252] 000000007ea0f600 0000000000ab3800 0000000000ab3800 00000000009e60b0
[ 58.125252] 0000000000ab3800 0000000000000000 0000000002e0f800 000000007b086520
[ 58.125252] 00000000006b39b0 00000000006a377a 000000007d82bbc8 000000007d82bd20
[ 58.125261] Call Trace:
[ 58.125263] ([<00000000006a377a>] __schedule+0x562/0xcc8)
[ 58.125264] [<00000000006a2dfe>] schedule_timeout+0x1ee/0x270
[ 58.125266] [<00000000006a4acc>] wait_for_common+0x100/0x1d0
[ 58.125267] [<00000000002ca3fe>] SyS_io_destroy+0x9a/0xdc
[ 58.125269] [<00000000006a80f8>] sysc_nr_ok+0x22/0x28
[ 58.125270] [<000003fffd21c6d2>] 0x3fffd21c6d2
> It is entirely possible the bug isn't
> caused by the referenced commit, as the commit you're pointing to merely
> makes io_destroy() syscall wait for all aio outstanding to complete
> before returning.
I cannot reproduce this when I revert said commit (on top of 14186fe). If
that matters - the arch is s390.
Regards,
Sebastian
>
> -ben
>
> > git bisect points to:
> > commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c
> > Author: Anatol Pomozov <anatol.pomozov@gmail.com>
> > Date: Tue Apr 15 11:31:33 2014 -0700
> >
> > aio: block io_destroy() until all context requests are completed
> >
> >
> > The fio workers are on the wait_for_completion in sys_io_destroy.
> >
> > Regards,
> > Sebastian
> > [global]
> > blocksize=4K
> > size=256M
> > rw=randrw
> > verify=md5
> > iodepth=32
> > ioengine=libaio
> > direct=1
> > end_fsync=1
> >
> > [file1]
> > filename=/dev/scma
> >
> > [file2]
> > filename=/dev/scmbw
> >
> > [file3]
> > filename=/dev/scmc
> >
> > [file4]
> > filename=/dev/scmx
>
>
> --
> "Thought is the essence of where you are now."
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-20 8:08 ` Sebastian Ott
@ 2014-05-20 13:16 ` Sebastian Ott
2014-05-20 14:26 ` Benjamin LaHaise
2014-05-20 16:12 ` Anatol Pomozov
0 siblings, 2 replies; 23+ messages in thread
From: Sebastian Ott @ 2014-05-20 13:16 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: Anatol Pomozov, linux-aio, linux-kernel
On Tue, 20 May 2014, Sebastian Ott wrote:
> On Mon, 19 May 2014, Benjamin LaHaise wrote:
> > It is entirely possible the bug isn't
> > caused by the referenced commit, as the commit you're pointing to merely
> > makes io_destroy() syscall wait for all aio outstanding to complete
> > before returning.
>
> I cannot reproduce this when I revert said commit (on top of 14186fe). If
> that matters - the arch is s390.
Hm, ok - maybe that commit is really just highlighting a refcounting bug.
I just compared traces for a good and a few bad cases. The good case:
# tracer: function
#
# entries-in-buffer/entries-written: 16/16 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
<idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
<idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
(4 fio workers, free_ioctx_reqs is called 4 times)
One of the bad cases:
# tracer: function
#
# entries-in-buffer/entries-written: 14/14 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
<idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
<idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
<idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
<idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
(1 fio worker in D state, free_ioctx_reqs is called 3 times)
Regards,
Sebastian
> >
> > > git bisect points to:
> > > commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c
> > > Author: Anatol Pomozov <anatol.pomozov@gmail.com>
> > > Date: Tue Apr 15 11:31:33 2014 -0700
> > >
> > > aio: block io_destroy() until all context requests are completed
> > >
> > >
> > > The fio workers are on the wait_for_completion in sys_io_destroy.
> > >
> > > Regards,
> > > Sebastian
> > > [global]
> > > blocksize=4K
> > > size=256M
> > > rw=randrw
> > > verify=md5
> > > iodepth=32
> > > ioengine=libaio
> > > direct=1
> > > end_fsync=1
> > >
> > > [file1]
> > > filename=/dev/scma
> > >
> > > [file2]
> > > filename=/dev/scmbw
> > >
> > > [file3]
> > > filename=/dev/scmc
> > >
> > > [file4]
> > > filename=/dev/scmx
> >
> >
> > --
> > "Thought is the essence of where you are now."
> >
> >
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-20 13:16 ` Sebastian Ott
@ 2014-05-20 14:26 ` Benjamin LaHaise
2014-05-20 16:12 ` Anatol Pomozov
1 sibling, 0 replies; 23+ messages in thread
From: Benjamin LaHaise @ 2014-05-20 14:26 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Anatol Pomozov, linux-aio, linux-kernel
On Tue, May 20, 2014 at 03:16:47PM +0200, Sebastian Ott wrote:
> On Tue, 20 May 2014, Sebastian Ott wrote:
> > On Mon, 19 May 2014, Benjamin LaHaise wrote:
> > > It is entirely possible the bug isn't
> > > caused by the referenced commit, as the commit you're pointing to merely
> > > makes io_destroy() syscall wait for all aio outstanding to complete
> > > before returning.
> >
> > I cannot reproduce this when I revert said commit (on top of 14186fe). If
> > that matters - the arch is s390.
> Hm, ok - maybe that commit is really just highlighting a refcounting bug.
> I just compared traces for a good and a few bad cases. The good case:
...
> (4 fio workers, free_ioctx_reqs is called 4 times)
> One of the bad cases:
....
> (1 fio worker in D state, free_ioctx_reqs is called 3 times)
This would seem to indicate that the problem is not with Anatol's change,
and the hang is a consequence of the AIO not completing. Can you trace
calls to aio_complete() in addition to free_ioctx_reqs() to see if a
completion is happening in the failed case? If aio_complete() is only
getting called 3 times, the problem is not in the aio layer.
-ben
> Regards,
> Sebastian
>
> > >
> > > > git bisect points to:
> > > > commit e02ba72aabfade4c9cd6e3263e9b57bf890ad25c
> > > > Author: Anatol Pomozov <anatol.pomozov@gmail.com>
> > > > Date: Tue Apr 15 11:31:33 2014 -0700
> > > >
> > > > aio: block io_destroy() until all context requests are completed
> > > >
> > > >
> > > > The fio workers are on the wait_for_completion in sys_io_destroy.
> > > >
> > > > Regards,
> > > > Sebastian
> > > > [global]
> > > > blocksize=4K
> > > > size=256M
> > > > rw=randrw
> > > > verify=md5
> > > > iodepth=32
> > > > ioengine=libaio
> > > > direct=1
> > > > end_fsync=1
> > > >
> > > > [file1]
> > > > filename=/dev/scma
> > > >
> > > > [file2]
> > > > filename=/dev/scmbw
> > > >
> > > > [file3]
> > > > filename=/dev/scmc
> > > >
> > > > [file4]
> > > > filename=/dev/scmx
> > >
> > >
> > > --
> > > "Thought is the essence of where you are now."
> > >
> > >
> >
--
"Thought is the essence of where you are now."
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-20 13:16 ` Sebastian Ott
2014-05-20 14:26 ` Benjamin LaHaise
@ 2014-05-20 16:12 ` Anatol Pomozov
2014-05-20 18:09 ` Sebastian Ott
1 sibling, 1 reply; 23+ messages in thread
From: Anatol Pomozov @ 2014-05-20 16:12 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Benjamin LaHaise, linux-aio, LKML
Hi
On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
<sebott@linux.vnet.ibm.com> wrote:
> On Tue, 20 May 2014, Sebastian Ott wrote:
>> On Mon, 19 May 2014, Benjamin LaHaise wrote:
>> > It is entirely possible the bug isn't
>> > caused by the referenced commit, as the commit you're pointing to merely
>> > makes io_destroy() syscall wait for all aio outstanding to complete
>> > before returning.
>>
>> I cannot reproduce this when I revert said commit (on top of 14186fe). If
>> that matters - the arch is s390.
>
> Hm, ok - maybe that commit is really just highlighting a refcounting bug.
> I just compared traces for a good and a few bad cases. The good case:
> # tracer: function
> #
> # entries-in-buffer/entries-written: 16/16 #P:4
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
> fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
> ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
> ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
> fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
> ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
> ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
> kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
> ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
> fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
> <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
> ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
> <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
>
> (4 fio workers, free_ioctx_reqs is called 4 times)
>
> One of the bad cases:
> # tracer: function
> #
> # entries-in-buffer/entries-written: 14/14 #P:4
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
> <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
> fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
> fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
> <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
> <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
> fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
> udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
> <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
> fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
> ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
>
> (1 fio worker in D state, free_ioctx_reqs is called 3 times)
Looking at the second trace: the first 3 io_destroy() calls cause
free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
Do you have more logs after the last line? If there is no more
free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
I suggest to add some logging to kernel to figure out what is the
refcount value at this moment.
But *if* there was one more free_ioctx_reqs() that does not
complete(ctx->requests_done) then it means the thread does not see
value of requests_done field. It would make me believe there is a
problem with memory reordering. I do not know whether s390 reorders
stores, but anyway, could you please check this patch?
anatol:linux $ git diff
diff --git a/fs/aio.c b/fs/aio.c
index a0ed6c7..6e3c0bb 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -514,6 +514,7 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
/* At this point we know that there are no any in-flight requests */
+ rmb();
if (ctx->requests_done)
complete(ctx->requests_done);
@@ -758,6 +759,7 @@ static void kill_ioctx(struct mm_struct *mm,
struct kioctx *ctx,
vm_munmap(ctx->mmap_base, ctx->mmap_size);
ctx->requests_done = requests_done;
+ wmb();
percpu_ref_kill(&ctx->users);
} else {
if (requests_done)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-20 16:12 ` Anatol Pomozov
@ 2014-05-20 18:09 ` Sebastian Ott
2014-05-20 18:39 ` Anatol Pomozov
0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Ott @ 2014-05-20 18:09 UTC (permalink / raw)
To: Anatol Pomozov; +Cc: Benjamin LaHaise, linux-aio, LKML
Hi,
On Tue, 20 May 2014, Anatol Pomozov wrote:
> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
> <sebott@linux.vnet.ibm.com> wrote:
> > On Tue, 20 May 2014, Sebastian Ott wrote:
> >> On Mon, 19 May 2014, Benjamin LaHaise wrote:
> >> > It is entirely possible the bug isn't
> >> > caused by the referenced commit, as the commit you're pointing to merely
> >> > makes io_destroy() syscall wait for all aio outstanding to complete
> >> > before returning.
> >>
> >> I cannot reproduce this when I revert said commit (on top of 14186fe). If
> >> that matters - the arch is s390.
> >
> > Hm, ok - maybe that commit is really just highlighting a refcounting bug.
> > I just compared traces for a good and a few bad cases. The good case:
> > # tracer: function
> > #
> > # entries-in-buffer/entries-written: 16/16 #P:4
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
> > fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
> > ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
> > ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
> > fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
> > ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
> > ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
> > kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
> > ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
> > fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
> > <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
> > ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
> > <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
> >
> > (4 fio workers, free_ioctx_reqs is called 4 times)
> >
> > One of the bad cases:
> > # tracer: function
> > #
> > # entries-in-buffer/entries-written: 14/14 #P:4
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
> > <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
> > fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
> > fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
> > <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
> > <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
> > fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
> > udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
> > <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
> > fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
> > ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
> >
> > (1 fio worker in D state, free_ioctx_reqs is called 3 times)
>
>
> Looking at the second trace: the first 3 io_destroy() calls cause
> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
> Do you have more logs after the last line?
Nope that was all.
> If there is no more
> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
> I suggest to add some logging to kernel to figure out what is the
> refcount value at this moment.
Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
the value was always the same
1 + (1UL << 31)
even for the free_ioctx_users invocations that were not followed by
free_ioctx_reqs.
>
> But *if* there was one more free_ioctx_reqs() that does not
> complete(ctx->requests_done) then it means the thread does not see
> value of requests_done field. It would make me believe there is a
> problem with memory reordering. I do not know whether s390 reorders
> stores, but anyway, could you please check this patch?
Nope s390 doesn't reorder stores - I tried the patch anyway, but it didn't
help.
Regards,
Sebastian
>
> anatol:linux $ git diff
> diff --git a/fs/aio.c b/fs/aio.c
> index a0ed6c7..6e3c0bb 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -514,6 +514,7 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
> struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
>
> /* At this point we know that there are no any in-flight requests */
> + rmb();
> if (ctx->requests_done)
> complete(ctx->requests_done);
>
> @@ -758,6 +759,7 @@ static void kill_ioctx(struct mm_struct *mm,
> struct kioctx *ctx,
> vm_munmap(ctx->mmap_base, ctx->mmap_size);
>
> ctx->requests_done = requests_done;
> + wmb();
> percpu_ref_kill(&ctx->users);
> } else {
> if (requests_done)
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-20 18:09 ` Sebastian Ott
@ 2014-05-20 18:39 ` Anatol Pomozov
2014-05-21 8:07 ` Sebastian Ott
0 siblings, 1 reply; 23+ messages in thread
From: Anatol Pomozov @ 2014-05-20 18:39 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Benjamin LaHaise, linux-aio, LKML
Hi
On Tue, May 20, 2014 at 11:09 AM, Sebastian Ott
<sebott@linux.vnet.ibm.com> wrote:
> Hi,
>
> On Tue, 20 May 2014, Anatol Pomozov wrote:
>> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
>> <sebott@linux.vnet.ibm.com> wrote:
>> > On Tue, 20 May 2014, Sebastian Ott wrote:
>> >> On Mon, 19 May 2014, Benjamin LaHaise wrote:
>> >> > It is entirely possible the bug isn't
>> >> > caused by the referenced commit, as the commit you're pointing to merely
>> >> > makes io_destroy() syscall wait for all aio outstanding to complete
>> >> > before returning.
>> >>
>> >> I cannot reproduce this when I revert said commit (on top of 14186fe). If
>> >> that matters - the arch is s390.
>> >
>> > Hm, ok - maybe that commit is really just highlighting a refcounting bug.
>> > I just compared traces for a good and a few bad cases. The good case:
>> > # tracer: function
>> > #
>> > # entries-in-buffer/entries-written: 16/16 #P:4
>> > #
>> > # _-----=> irqs-off
>> > # / _----=> need-resched
>> > # | / _---=> hardirq/softirq
>> > # || / _--=> preempt-depth
>> > # ||| / delay
>> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>> > # | | | |||| | |
>> > fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
>> > fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
>> > ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
>> > ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
>> > fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
>> > ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
>> > kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
>> > ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
>> > fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
>> > <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
>> > ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
>> > <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
>> >
>> > (4 fio workers, free_ioctx_reqs is called 4 times)
>> >
>> > One of the bad cases:
>> > # tracer: function
>> > #
>> > # entries-in-buffer/entries-written: 14/14 #P:4
>> > #
>> > # _-----=> irqs-off
>> > # / _----=> need-resched
>> > # | / _---=> hardirq/softirq
>> > # || / _--=> preempt-depth
>> > # ||| / delay
>> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>> > # | | | |||| | |
>> > fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
>> > <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
>> > fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
>> > fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
>> > <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
>> > <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
>> > fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
>> > udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
>> > <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
>> > fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
>> > ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
>> >
>> > (1 fio worker in D state, free_ioctx_reqs is called 3 times)
>>
>>
>> Looking at the second trace: the first 3 io_destroy() calls cause
>> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
>> Do you have more logs after the last line?
>
> Nope that was all.
>
>> If there is no more
>> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
>> I suggest to add some logging to kernel to figure out what is the
>> refcount value at this moment.
>
> Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
> free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
> the value was always the same
> 1 + (1UL << 31)
> even for the free_ioctx_users invocations that were not followed by
> free_ioctx_reqs.
Could you add atomic_read(&ctx->reqs.count) *after* the
percpu_ref_kill(&ctx->reqs)?
before percpu_ref_kill() the refcounter is per-CPU one and correctly
calculate the real value you need to take &ctx->reqs.pcpu_count into
account. But after percpu_ref_kill() the reqs.count become the normal
shared refcounter.
>
>>
>> But *if* there was one more free_ioctx_reqs() that does not
>> complete(ctx->requests_done) then it means the thread does not see
>> value of requests_done field. It would make me believe there is a
>> problem with memory reordering. I do not know whether s390 reorders
>> stores, but anyway, could you please check this patch?
>
> Nope s390 doesn't reorder stores - I tried the patch anyway, but it didn't
> help.
>
> Regards,
> Sebastian
>>
>> anatol:linux $ git diff
>> diff --git a/fs/aio.c b/fs/aio.c
>> index a0ed6c7..6e3c0bb 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -514,6 +514,7 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
>> struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
>>
>> /* At this point we know that there are no any in-flight requests */
>> + rmb();
>> if (ctx->requests_done)
>> complete(ctx->requests_done);
>>
>> @@ -758,6 +759,7 @@ static void kill_ioctx(struct mm_struct *mm,
>> struct kioctx *ctx,
>> vm_munmap(ctx->mmap_base, ctx->mmap_size);
>>
>> ctx->requests_done = requests_done;
>> + wmb();
>> percpu_ref_kill(&ctx->users);
>> } else {
>> if (requests_done)
>>
>>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-20 18:39 ` Anatol Pomozov
@ 2014-05-21 8:07 ` Sebastian Ott
2014-05-21 8:48 ` Sebastian Ott
0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Ott @ 2014-05-21 8:07 UTC (permalink / raw)
To: Anatol Pomozov; +Cc: Benjamin LaHaise, linux-aio, LKML
On Tue, 20 May 2014, Anatol Pomozov wrote:
> On Tue, May 20, 2014 at 11:09 AM, Sebastian Ott
> <sebott@linux.vnet.ibm.com> wrote:
> > On Tue, 20 May 2014, Anatol Pomozov wrote:
> >> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
> >> <sebott@linux.vnet.ibm.com> wrote:
> >> > On Tue, 20 May 2014, Sebastian Ott wrote:
> >> >> On Mon, 19 May 2014, Benjamin LaHaise wrote:
> >> >> > It is entirely possible the bug isn't
> >> >> > caused by the referenced commit, as the commit you're pointing to merely
> >> >> > makes io_destroy() syscall wait for all aio outstanding to complete
> >> >> > before returning.
> >> >>
> >> >> I cannot reproduce this when I revert said commit (on top of 14186fe). If
> >> >> that matters - the arch is s390.
> >> >
> >> > Hm, ok - maybe that commit is really just highlighting a refcounting bug.
> >> > I just compared traces for a good and a few bad cases. The good case:
> >> > # tracer: function
> >> > #
> >> > # entries-in-buffer/entries-written: 16/16 #P:4
> >> > #
> >> > # _-----=> irqs-off
> >> > # / _----=> need-resched
> >> > # | / _---=> hardirq/softirq
> >> > # || / _--=> preempt-depth
> >> > # ||| / delay
> >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> >> > # | | | |||| | |
> >> > fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
> >> > fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
> >> > ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
> >> > ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
> >> > fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
> >> > ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
> >> > kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
> >> > ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
> >> > fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
> >> > <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
> >> > ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
> >> > <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
> >> >
> >> > (4 fio workers, free_ioctx_reqs is called 4 times)
> >> >
> >> > One of the bad cases:
> >> > # tracer: function
> >> > #
> >> > # entries-in-buffer/entries-written: 14/14 #P:4
> >> > #
> >> > # _-----=> irqs-off
> >> > # / _----=> need-resched
> >> > # | / _---=> hardirq/softirq
> >> > # || / _--=> preempt-depth
> >> > # ||| / delay
> >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> >> > # | | | |||| | |
> >> > fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
> >> > <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
> >> > fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
> >> > fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
> >> > <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
> >> > <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
> >> > fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
> >> > udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
> >> > <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
> >> > fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
> >> > ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
> >> >
> >> > (1 fio worker in D state, free_ioctx_reqs is called 3 times)
> >>
> >>
> >> Looking at the second trace: the first 3 io_destroy() calls cause
> >> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
> >> Do you have more logs after the last line?
> >
> > Nope that was all.
> >
> >> If there is no more
> >> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
> >> I suggest to add some logging to kernel to figure out what is the
> >> refcount value at this moment.
> >
> > Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
> > free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
> > the value was always the same
> > 1 + (1UL << 31)
> > even for the free_ioctx_users invocations that were not followed by
> > free_ioctx_reqs.
>
> Could you add atomic_read(&ctx->reqs.count) *after* the
> percpu_ref_kill(&ctx->reqs)?
I already did that and it didn't change, always 1 + (1UL << 31) in all
cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
familiar with this percpu_ref stuff but it looks like the initial
reference is dropped asynchronous.
Regards,
Sebastian
>
> before percpu_ref_kill() the refcounter is per-CPU one and correctly
> calculate the real value you need to take &ctx->reqs.pcpu_count into
> account. But after percpu_ref_kill() the reqs.count become the normal
> shared refcounter.
>
> >
> >>
> >> But *if* there was one more free_ioctx_reqs() that does not
> >> complete(ctx->requests_done) then it means the thread does not see
> >> value of requests_done field. It would make me believe there is a
> >> problem with memory reordering. I do not know whether s390 reorders
> >> stores, but anyway, could you please check this patch?
> >
> > Nope s390 doesn't reorder stores - I tried the patch anyway, but it didn't
> > help.
> >
> > Regards,
> > Sebastian
> >>
> >> anatol:linux $ git diff
> >> diff --git a/fs/aio.c b/fs/aio.c
> >> index a0ed6c7..6e3c0bb 100644
> >> --- a/fs/aio.c
> >> +++ b/fs/aio.c
> >> @@ -514,6 +514,7 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
> >> struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
> >>
> >> /* At this point we know that there are no any in-flight requests */
> >> + rmb();
> >> if (ctx->requests_done)
> >> complete(ctx->requests_done);
> >>
> >> @@ -758,6 +759,7 @@ static void kill_ioctx(struct mm_struct *mm,
> >> struct kioctx *ctx,
> >> vm_munmap(ctx->mmap_base, ctx->mmap_size);
> >>
> >> ctx->requests_done = requests_done;
> >> + wmb();
> >> percpu_ref_kill(&ctx->users);
> >> } else {
> >> if (requests_done)
> >>
> >>
> >
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-21 8:07 ` Sebastian Ott
@ 2014-05-21 8:48 ` Sebastian Ott
2014-05-21 13:20 ` Benjamin LaHaise
2014-05-21 14:18 ` Anatol Pomozov
0 siblings, 2 replies; 23+ messages in thread
From: Sebastian Ott @ 2014-05-21 8:48 UTC (permalink / raw)
To: Anatol Pomozov; +Cc: Benjamin LaHaise, linux-aio, LKML
On Wed, 21 May 2014, Sebastian Ott wrote:
> On Tue, 20 May 2014, Anatol Pomozov wrote:
> > On Tue, May 20, 2014 at 11:09 AM, Sebastian Ott
> > <sebott@linux.vnet.ibm.com> wrote:
> > > On Tue, 20 May 2014, Anatol Pomozov wrote:
> > >> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
> > >> <sebott@linux.vnet.ibm.com> wrote:
> > >> > On Tue, 20 May 2014, Sebastian Ott wrote:
> > >> >> On Mon, 19 May 2014, Benjamin LaHaise wrote:
> > >> >> > It is entirely possible the bug isn't
> > >> >> > caused by the referenced commit, as the commit you're pointing to merely
> > >> >> > makes io_destroy() syscall wait for all aio outstanding to complete
> > >> >> > before returning.
> > >> >>
> > >> >> I cannot reproduce this when I revert said commit (on top of 14186fe). If
> > >> >> that matters - the arch is s390.
> > >> >
> > >> > Hm, ok - maybe that commit is really just highlighting a refcounting bug.
> > >> > I just compared traces for a good and a few bad cases. The good case:
> > >> > # tracer: function
> > >> > #
> > >> > # entries-in-buffer/entries-written: 16/16 #P:4
> > >> > #
> > >> > # _-----=> irqs-off
> > >> > # / _----=> need-resched
> > >> > # | / _---=> hardirq/softirq
> > >> > # || / _--=> preempt-depth
> > >> > # ||| / delay
> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > >> > # | | | |||| | |
> > >> > fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
> > >> > fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
> > >> > ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
> > >> > ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
> > >> > fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
> > >> > ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
> > >> > ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
> > >> > kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
> > >> > kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
> > >> > ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
> > >> > fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
> > >> > <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
> > >> > kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
> > >> > ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
> > >> > <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
> > >> > kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
> > >> >
> > >> > (4 fio workers, free_ioctx_reqs is called 4 times)
> > >> >
> > >> > One of the bad cases:
> > >> > # tracer: function
> > >> > #
> > >> > # entries-in-buffer/entries-written: 14/14 #P:4
> > >> > #
> > >> > # _-----=> irqs-off
> > >> > # / _----=> need-resched
> > >> > # | / _---=> hardirq/softirq
> > >> > # || / _--=> preempt-depth
> > >> > # ||| / delay
> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > >> > # | | | |||| | |
> > >> > fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
> > >> > <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
> > >> > fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
> > >> > fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
> > >> > <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
> > >> > kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
> > >> > <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
> > >> > kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
> > >> > fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
> > >> > udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
> > >> > <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
> > >> > kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
> > >> > fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
> > >> > ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
> > >> >
> > >> > (1 fio worker in D state, free_ioctx_reqs is called 3 times)
> > >>
> > >>
> > >> Looking at the second trace: the first 3 io_destroy() calls cause
> > >> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
> > >> Do you have more logs after the last line?
> > >
> > > Nope that was all.
> > >
> > >> If there is no more
> > >> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
> > >> I suggest to add some logging to kernel to figure out what is the
> > >> refcount value at this moment.
> > >
> > > Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
> > > free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
> > > the value was always the same
> > > 1 + (1UL << 31)
> > > even for the free_ioctx_users invocations that were not followed by
> > > free_ioctx_reqs.
> >
> > Could you add atomic_read(&ctx->reqs.count) *after* the
> > percpu_ref_kill(&ctx->reqs)?
>
> I already did that and it didn't change, always 1 + (1UL << 31) in all
> cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
> familiar with this percpu_ref stuff but it looks like the initial
> reference is dropped asynchronous.
cat /sys/kernel/debug/tracing/trace
# tracer: function
#
# entries-in-buffer/entries-written: 25/25 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
fio-856 [001] .... 160.876428: SyS_io_destroy: 0000000074a18000
fio-856 [001] .... 160.876430: kill_ioctx <-SyS_io_destroy
fio-855 [000] .... 160.887737: SyS_io_destroy: 0000000074f40600
fio-855 [000] .... 160.887738: kill_ioctx <-SyS_io_destroy
fio-849 [001] ..s. 160.911948: free_ioctx_users <-percpu_ref_kill_rcu
ksoftirqd/0-3 [000] ..s. 160.932488: free_ioctx_users <-percpu_ref_kill_rcu
fio-854 [001] .... 160.938881: SyS_io_destroy: 0000000074ac0600
fio-854 [001] .... 160.938881: kill_ioctx <-SyS_io_destroy
ksoftirqd/1-11 [001] ..s. 160.942016: aio_confirm_reqs: 0000000074a18000 reqs=1
ksoftirqd/1-11 [001] ..s. 160.942016: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/1:2-465 [001] .... 160.942021: free_ioctx <-process_one_work
fio-856 [002] .... 160.942033: SyS_io_destroy: 0000000074a18000 done
fio-849 [002] .... 160.942641: SyS_io_destroy: 0000000074f28600
fio-849 [002] .... 160.942641: kill_ioctx <-SyS_io_destroy
ksoftirqd/1-11 [001] ..s. 160.961981: free_ioctx_users <-percpu_ref_kill_rcu
<idle>-0 [000] .ns. 160.962010: aio_confirm_reqs: 0000000074f40600 reqs=1
<idle>-0 [000] .ns. 160.962011: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/0:0-4 [000] .... 160.962016: free_ioctx <-process_one_work
fio-855 [001] .... 160.962017: SyS_io_destroy: 0000000074f40600 done
ksoftirqd/2-15 [002] ..s. 160.971998: free_ioctx_users <-percpu_ref_kill_rcu
ksoftirqd/1-11 [001] ..s. 160.994552: aio_confirm_reqs: 0000000074ac0600 reqs=2
<idle>-0 [002] .ns. 161.061976: aio_confirm_reqs: 0000000074f28600 reqs=1
<idle>-0 [002] .ns. 161.061976: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/2:2-181 [002] .... 161.061980: free_ioctx <-process_one_work
fio-849 [003] .... 161.061983: SyS_io_destroy: 0000000074f28600 done
diff --git a/fs/aio.c b/fs/aio.c
index a0ed6c7..3b8e989 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -521,6 +521,13 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
schedule_work(&ctx->free_work);
}
+void aio_confirm_reqs(struct percpu_ref *ref)
+{
+ struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
+
+ trace_printk("%p reqs=%d\n", ctx, atomic_read(&ref->count));
+}
+
/*
* When this function runs, the kioctx has been removed from the "hash table"
* and ctx->users has dropped to 0, so we know no more kiocbs can be submitted -
@@ -543,7 +550,7 @@ static void free_ioctx_users(struct percpu_ref *ref)
spin_unlock_irq(&ctx->ctx_lock);
- percpu_ref_kill(&ctx->reqs);
+ percpu_ref_kill_and_confirm(&ctx->reqs, aio_confirm_reqs);
percpu_ref_put(&ctx->reqs);
}
@@ -1220,6 +1227,8 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
struct completion requests_done =
COMPLETION_INITIALIZER_ONSTACK(requests_done);
+ trace_printk("%p\n", ioctx);
+
/* Pass requests_done to kill_ioctx() where it can be set
* in a thread-safe way. If we try to set it here then we have
* a race condition if two io_destroy() called simultaneously.
@@ -1232,6 +1241,7 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
* is destroyed.
*/
wait_for_completion(&requests_done);
+ trace_printk("%p done\n", ioctx);
return 0;
}
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-21 8:48 ` Sebastian Ott
@ 2014-05-21 13:20 ` Benjamin LaHaise
2014-05-21 14:12 ` Sebastian Ott
2014-05-21 14:18 ` Anatol Pomozov
1 sibling, 1 reply; 23+ messages in thread
From: Benjamin LaHaise @ 2014-05-21 13:20 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Anatol Pomozov, linux-aio, LKML
On Wed, May 21, 2014 at 10:48:55AM +0200, Sebastian Ott wrote:
> >
> > I already did that and it didn't change, always 1 + (1UL << 31) in all
> > cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
> > familiar with this percpu_ref stuff but it looks like the initial
> > reference is dropped asynchronous.
>
>
> cat /sys/kernel/debug/tracing/trace
Your trace still isn't monitoring aio_complete(). You need to check if
aio_complete() is getting called in order to determine if the bug is in
the core aio code or not.
-ben
--
"Thought is the essence of where you are now."
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-21 13:20 ` Benjamin LaHaise
@ 2014-05-21 14:12 ` Sebastian Ott
0 siblings, 0 replies; 23+ messages in thread
From: Sebastian Ott @ 2014-05-21 14:12 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: Anatol Pomozov, linux-aio, LKML
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1155 bytes --]
Hi,
On Wed, 21 May 2014, Benjamin LaHaise wrote:
> On Wed, May 21, 2014 at 10:48:55AM +0200, Sebastian Ott wrote:
> > >
> > > I already did that and it didn't change, always 1 + (1UL << 31) in all
> > > cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
> > > familiar with this percpu_ref stuff but it looks like the initial
> > > reference is dropped asynchronous.
> >
> >
> > cat /sys/kernel/debug/tracing/trace
>
> Your trace still isn't monitoring aio_complete(). You need to check if
> aio_complete() is getting called in order to determine if the bug is in
> the core aio code or not.
Yes, sry about that there were just too many of them. But I was able to
reproduce the problem with fio writing a little less data. Sadly it's
still much tracing data - a compressed archive is attached.
The number of aio_complete invocations is the same for the good and the
bad case:
for T in trace.bad trace.good ;do wc $T ;done
49156 294920 3735651 trace.bad
49159 294939 3735901 trace.good
for T in trace.bad trace.good ;do grep aio_complete $T | wc ;done
49120 294720 3733120
49120 294720 3733120
Regards,
Sebastian
[-- Attachment #2: Type: APPLICATION/x-xz, Size: 295128 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-21 8:48 ` Sebastian Ott
2014-05-21 13:20 ` Benjamin LaHaise
@ 2014-05-21 14:18 ` Anatol Pomozov
2014-05-21 14:57 ` Sebastian Ott
2014-06-04 13:58 ` [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process) Sebastian Ott
1 sibling, 2 replies; 23+ messages in thread
From: Anatol Pomozov @ 2014-05-21 14:18 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Benjamin LaHaise, linux-aio, LKML
Hi
On Wed, May 21, 2014 at 1:48 AM, Sebastian Ott
<sebott@linux.vnet.ibm.com> wrote:
> On Wed, 21 May 2014, Sebastian Ott wrote:
>> On Tue, 20 May 2014, Anatol Pomozov wrote:
>> > On Tue, May 20, 2014 at 11:09 AM, Sebastian Ott
>> > <sebott@linux.vnet.ibm.com> wrote:
>> > > On Tue, 20 May 2014, Anatol Pomozov wrote:
>> > >> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
>> > >> <sebott@linux.vnet.ibm.com> wrote:
>> > >> > On Tue, 20 May 2014, Sebastian Ott wrote:
>> > >> >> On Mon, 19 May 2014, Benjamin LaHaise wrote:
>> > >> >> > It is entirely possible the bug isn't
>> > >> >> > caused by the referenced commit, as the commit you're pointing to merely
>> > >> >> > makes io_destroy() syscall wait for all aio outstanding to complete
>> > >> >> > before returning.
>> > >> >>
>> > >> >> I cannot reproduce this when I revert said commit (on top of 14186fe). If
>> > >> >> that matters - the arch is s390.
>> > >> >
>> > >> > Hm, ok - maybe that commit is really just highlighting a refcounting bug.
>> > >> > I just compared traces for a good and a few bad cases. The good case:
>> > >> > # tracer: function
>> > >> > #
>> > >> > # entries-in-buffer/entries-written: 16/16 #P:4
>> > >> > #
>> > >> > # _-----=> irqs-off
>> > >> > # / _----=> need-resched
>> > >> > # | / _---=> hardirq/softirq
>> > >> > # || / _--=> preempt-depth
>> > >> > # ||| / delay
>> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>> > >> > # | | | |||| | |
>> > >> > fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
>> > >> > fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
>> > >> > ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> > ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> > fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
>> > >> > ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > >> > ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > >> > kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
>> > >> > kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
>> > >> > ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> > fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
>> > >> > <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > >> > kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
>> > >> > ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> > <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > >> > kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
>> > >> >
>> > >> > (4 fio workers, free_ioctx_reqs is called 4 times)
>> > >> >
>> > >> > One of the bad cases:
>> > >> > # tracer: function
>> > >> > #
>> > >> > # entries-in-buffer/entries-written: 14/14 #P:4
>> > >> > #
>> > >> > # _-----=> irqs-off
>> > >> > # / _----=> need-resched
>> > >> > # | / _---=> hardirq/softirq
>> > >> > # || / _--=> preempt-depth
>> > >> > # ||| / delay
>> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>> > >> > # | | | |||| | |
>> > >> > fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
>> > >> > <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> > fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
>> > >> > fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> > <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > >> > kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
>> > >> > <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > >> > kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
>> > >> > fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
>> > >> > udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> > <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
>> > >> > kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
>> > >> > fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
>> > >> > ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
>> > >> >
>> > >> > (1 fio worker in D state, free_ioctx_reqs is called 3 times)
>> > >>
>> > >>
>> > >> Looking at the second trace: the first 3 io_destroy() calls cause
>> > >> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
>> > >> Do you have more logs after the last line?
>> > >
>> > > Nope that was all.
>> > >
>> > >> If there is no more
>> > >> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
>> > >> I suggest to add some logging to kernel to figure out what is the
>> > >> refcount value at this moment.
>> > >
>> > > Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
>> > > free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
>> > > the value was always the same
>> > > 1 + (1UL << 31)
>> > > even for the free_ioctx_users invocations that were not followed by
>> > > free_ioctx_reqs.
>> >
>> > Could you add atomic_read(&ctx->reqs.count) *after* the
>> > percpu_ref_kill(&ctx->reqs)?
>>
>> I already did that and it didn't change, always 1 + (1UL << 31) in all
>> cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
>> familiar with this percpu_ref stuff but it looks like the initial
>> reference is dropped asynchronous.
>
>
> cat /sys/kernel/debug/tracing/trace
> # tracer: function
> #
> # entries-in-buffer/entries-written: 25/25 #P:4
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> fio-856 [001] .... 160.876428: SyS_io_destroy: 0000000074a18000
> fio-856 [001] .... 160.876430: kill_ioctx <-SyS_io_destroy
> fio-855 [000] .... 160.887737: SyS_io_destroy: 0000000074f40600
> fio-855 [000] .... 160.887738: kill_ioctx <-SyS_io_destroy
> fio-849 [001] ..s. 160.911948: free_ioctx_users <-percpu_ref_kill_rcu
> ksoftirqd/0-3 [000] ..s. 160.932488: free_ioctx_users <-percpu_ref_kill_rcu
> fio-854 [001] .... 160.938881: SyS_io_destroy: 0000000074ac0600
> fio-854 [001] .... 160.938881: kill_ioctx <-SyS_io_destroy
> ksoftirqd/1-11 [001] ..s. 160.942016: aio_confirm_reqs: 0000000074a18000 reqs=1
> ksoftirqd/1-11 [001] ..s. 160.942016: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/1:2-465 [001] .... 160.942021: free_ioctx <-process_one_work
> fio-856 [002] .... 160.942033: SyS_io_destroy: 0000000074a18000 done
> fio-849 [002] .... 160.942641: SyS_io_destroy: 0000000074f28600
> fio-849 [002] .... 160.942641: kill_ioctx <-SyS_io_destroy
> ksoftirqd/1-11 [001] ..s. 160.961981: free_ioctx_users <-percpu_ref_kill_rcu
> <idle>-0 [000] .ns. 160.962010: aio_confirm_reqs: 0000000074f40600 reqs=1
> <idle>-0 [000] .ns. 160.962011: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/0:0-4 [000] .... 160.962016: free_ioctx <-process_one_work
> fio-855 [001] .... 160.962017: SyS_io_destroy: 0000000074f40600 done
> ksoftirqd/2-15 [002] ..s. 160.971998: free_ioctx_users <-percpu_ref_kill_rcu
> ksoftirqd/1-11 [001] ..s. 160.994552: aio_confirm_reqs: 0000000074ac0600 reqs=2
Here it is. aio context 0000000074ac0600 has refcount == 2 (one for
initial refcount and one grabbed by someone). You need to find the one
who grabbed the refcount and figure out why it does not drop it.
> <idle>-0 [002] .ns. 161.061976: aio_confirm_reqs: 0000000074f28600 reqs=1
> <idle>-0 [002] .ns. 161.061976: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/2:2-181 [002] .... 161.061980: free_ioctx <-process_one_work
> fio-849 [003] .... 161.061983: SyS_io_destroy: 0000000074f28600 done
>
>
> diff --git a/fs/aio.c b/fs/aio.c
> index a0ed6c7..3b8e989 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -521,6 +521,13 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
> schedule_work(&ctx->free_work);
> }
>
> +void aio_confirm_reqs(struct percpu_ref *ref)
> +{
> + struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
> +
> + trace_printk("%p reqs=%d\n", ctx, atomic_read(&ref->count));
> +}
> +
> /*
> * When this function runs, the kioctx has been removed from the "hash table"
> * and ctx->users has dropped to 0, so we know no more kiocbs can be submitted -
> @@ -543,7 +550,7 @@ static void free_ioctx_users(struct percpu_ref *ref)
>
> spin_unlock_irq(&ctx->ctx_lock);
>
> - percpu_ref_kill(&ctx->reqs);
> + percpu_ref_kill_and_confirm(&ctx->reqs, aio_confirm_reqs);
> percpu_ref_put(&ctx->reqs);
> }
>
> @@ -1220,6 +1227,8 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
> struct completion requests_done =
> COMPLETION_INITIALIZER_ONSTACK(requests_done);
>
> + trace_printk("%p\n", ioctx);
> +
> /* Pass requests_done to kill_ioctx() where it can be set
> * in a thread-safe way. If we try to set it here then we have
> * a race condition if two io_destroy() called simultaneously.
> @@ -1232,6 +1241,7 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
> * is destroyed.
> */
> wait_for_completion(&requests_done);
> + trace_printk("%p done\n", ioctx);
>
> return 0;
> }
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-21 14:18 ` Anatol Pomozov
@ 2014-05-21 14:57 ` Sebastian Ott
2014-06-04 5:24 ` Gu Zheng
2014-06-04 13:58 ` [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process) Sebastian Ott
1 sibling, 1 reply; 23+ messages in thread
From: Sebastian Ott @ 2014-05-21 14:57 UTC (permalink / raw)
To: Anatol Pomozov; +Cc: Benjamin LaHaise, linux-aio, LKML
On Wed, 21 May 2014, Anatol Pomozov wrote:
> On Wed, May 21, 2014 at 1:48 AM, Sebastian Ott
> <sebott@linux.vnet.ibm.com> wrote:
> > On Wed, 21 May 2014, Sebastian Ott wrote:
> >> On Tue, 20 May 2014, Anatol Pomozov wrote:
> >> > On Tue, May 20, 2014 at 11:09 AM, Sebastian Ott
> >> > <sebott@linux.vnet.ibm.com> wrote:
> >> > > On Tue, 20 May 2014, Anatol Pomozov wrote:
> >> > >> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
> >> > >> <sebott@linux.vnet.ibm.com> wrote:
> >> > >> > On Tue, 20 May 2014, Sebastian Ott wrote:
> >> > >> >> On Mon, 19 May 2014, Benjamin LaHaise wrote:
> >> > >> >> > It is entirely possible the bug isn't
> >> > >> >> > caused by the referenced commit, as the commit you're pointing to merely
> >> > >> >> > makes io_destroy() syscall wait for all aio outstanding to complete
> >> > >> >> > before returning.
> >> > >> >>
> >> > >> >> I cannot reproduce this when I revert said commit (on top of 14186fe). If
> >> > >> >> that matters - the arch is s390.
> >> > >> >
> >> > >> > Hm, ok - maybe that commit is really just highlighting a refcounting bug.
> >> > >> > I just compared traces for a good and a few bad cases. The good case:
> >> > >> > # tracer: function
> >> > >> > #
> >> > >> > # entries-in-buffer/entries-written: 16/16 #P:4
> >> > >> > #
> >> > >> > # _-----=> irqs-off
> >> > >> > # / _----=> need-resched
> >> > >> > # | / _---=> hardirq/softirq
> >> > >> > # || / _--=> preempt-depth
> >> > >> > # ||| / delay
> >> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> >> > >> > # | | | |||| | |
> >> > >> > fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
> >> > >> > fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
> >> > >> > kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
> >> > >> > <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
> >> > >> >
> >> > >> > (4 fio workers, free_ioctx_reqs is called 4 times)
> >> > >> >
> >> > >> > One of the bad cases:
> >> > >> > # tracer: function
> >> > >> > #
> >> > >> > # entries-in-buffer/entries-written: 14/14 #P:4
> >> > >> > #
> >> > >> > # _-----=> irqs-off
> >> > >> > # / _----=> need-resched
> >> > >> > # | / _---=> hardirq/softirq
> >> > >> > # || / _--=> preempt-depth
> >> > >> > # ||| / delay
> >> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> >> > >> > # | | | |||| | |
> >> > >> > fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
> >> > >> > <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
> >> > >> > fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
> >> > >> > <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
> >> > >> > fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
> >> > >> > udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
> >> > >> > fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
> >> > >> > ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> >
> >> > >> > (1 fio worker in D state, free_ioctx_reqs is called 3 times)
> >> > >>
> >> > >>
> >> > >> Looking at the second trace: the first 3 io_destroy() calls cause
> >> > >> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
> >> > >> Do you have more logs after the last line?
> >> > >
> >> > > Nope that was all.
> >> > >
> >> > >> If there is no more
> >> > >> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
> >> > >> I suggest to add some logging to kernel to figure out what is the
> >> > >> refcount value at this moment.
> >> > >
> >> > > Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
> >> > > free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
> >> > > the value was always the same
> >> > > 1 + (1UL << 31)
> >> > > even for the free_ioctx_users invocations that were not followed by
> >> > > free_ioctx_reqs.
> >> >
> >> > Could you add atomic_read(&ctx->reqs.count) *after* the
> >> > percpu_ref_kill(&ctx->reqs)?
> >>
> >> I already did that and it didn't change, always 1 + (1UL << 31) in all
> >> cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
> >> familiar with this percpu_ref stuff but it looks like the initial
> >> reference is dropped asynchronous.
> >
> >
> > cat /sys/kernel/debug/tracing/trace
> > # tracer: function
> > #
> > # entries-in-buffer/entries-written: 25/25 #P:4
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > fio-856 [001] .... 160.876428: SyS_io_destroy: 0000000074a18000
> > fio-856 [001] .... 160.876430: kill_ioctx <-SyS_io_destroy
> > fio-855 [000] .... 160.887737: SyS_io_destroy: 0000000074f40600
> > fio-855 [000] .... 160.887738: kill_ioctx <-SyS_io_destroy
> > fio-849 [001] ..s. 160.911948: free_ioctx_users <-percpu_ref_kill_rcu
> > ksoftirqd/0-3 [000] ..s. 160.932488: free_ioctx_users <-percpu_ref_kill_rcu
> > fio-854 [001] .... 160.938881: SyS_io_destroy: 0000000074ac0600
> > fio-854 [001] .... 160.938881: kill_ioctx <-SyS_io_destroy
> > ksoftirqd/1-11 [001] ..s. 160.942016: aio_confirm_reqs: 0000000074a18000 reqs=1
> > ksoftirqd/1-11 [001] ..s. 160.942016: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/1:2-465 [001] .... 160.942021: free_ioctx <-process_one_work
> > fio-856 [002] .... 160.942033: SyS_io_destroy: 0000000074a18000 done
> > fio-849 [002] .... 160.942641: SyS_io_destroy: 0000000074f28600
> > fio-849 [002] .... 160.942641: kill_ioctx <-SyS_io_destroy
> > ksoftirqd/1-11 [001] ..s. 160.961981: free_ioctx_users <-percpu_ref_kill_rcu
> > <idle>-0 [000] .ns. 160.962010: aio_confirm_reqs: 0000000074f40600 reqs=1
> > <idle>-0 [000] .ns. 160.962011: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/0:0-4 [000] .... 160.962016: free_ioctx <-process_one_work
> > fio-855 [001] .... 160.962017: SyS_io_destroy: 0000000074f40600 done
> > ksoftirqd/2-15 [002] ..s. 160.971998: free_ioctx_users <-percpu_ref_kill_rcu
> > ksoftirqd/1-11 [001] ..s. 160.994552: aio_confirm_reqs: 0000000074ac0600 reqs=2
>
> Here it is. aio context 0000000074ac0600 has refcount == 2 (one for
> initial refcount and one grabbed by someone). You need to find the one
> who grabbed the refcount and figure out why it does not drop it.
I (hopefully) traced all gets and puts (of ctx->reqs). Hm gets and puts
are equal in numbers:
cat /sys/kernel/debug/tracing/trace | grep aio_complete | wc
49120 294720 3487520
cat /sys/kernel/debug/tracing/trace | grep do_io_submit | wc
49120 294720 3487520
cat /sys/kernel/debug/tracing/trace | grep -v aio_complete | grep -v do_io_submit
# tracer: function
#
# entries-in-buffer/entries-written: 98273/98273 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
fio-793 [002] .... 86.697232: SyS_io_setup: get_reqs
fio-794 [000] .... 86.697232: SyS_io_setup: get_reqs
fio-788 [003] .... 86.697235: SyS_io_setup: get_reqs
fio-795 [001] .... 86.697237: SyS_io_setup: get_reqs
fio-793 [000] .... 87.217093: SyS_io_destroy: 00000000796d0000
fio-793 [000] .... 87.217095: kill_ioctx <-SyS_io_destroy
fio-795 [003] .... 87.221408: SyS_io_destroy: 000000007a038000
fio-795 [003] .... 87.221409: kill_ioctx <-SyS_io_destroy
fio-788 [003] .... 87.226725: SyS_io_destroy: 00000000796c8000
fio-788 [003] .... 87.226726: kill_ioctx <-SyS_io_destroy
fio-794 [000] .... 87.227840: SyS_io_destroy: 0000000079420000
fio-794 [000] .... 87.227841: kill_ioctx <-SyS_io_destroy
ksoftirqd/3-19 [003] ..s. 87.244201: free_ioctx_users <-percpu_ref_kill_rcu
ksoftirqd/3-19 [003] ..s. 87.244201: free_ioctx_users: put_reqs
ksoftirqd/3-19 [003] ..s. 87.244205: free_ioctx_users <-percpu_ref_kill_rcu
ksoftirqd/3-19 [003] ..s. 87.244205: free_ioctx_users: put_reqs
ksoftirqd/0-3 [000] ..s. 87.251898: free_ioctx_users <-percpu_ref_kill_rcu
ksoftirqd/0-3 [000] ..s. 87.251898: free_ioctx_users: put_reqs
ksoftirqd/0-3 [000] ..s. 87.251903: free_ioctx_users <-percpu_ref_kill_rcu
ksoftirqd/0-3 [000] ..s. 87.251903: free_ioctx_users: put_reqs
ksoftirqd/0-3 [000] ..s. 87.271868: aio_confirm_reqs: 00000000796d0000 reqs=1
ksoftirqd/0-3 [000] ..s. 87.271868: free_ioctx_reqs <-percpu_ref_kill_rcu
ksoftirqd/0-3 [000] ..s. 87.271874: aio_confirm_reqs: 0000000079420000 reqs=6
kworker/0:1-53 [000] .... 87.271875: free_ioctx <-process_one_work
fio-793 [001] .... 87.271879: SyS_io_destroy: 00000000796d0000 done
ksoftirqd/3-19 [003] ..s. 87.271899: aio_confirm_reqs: 000000007a038000 reqs=1
ksoftirqd/3-19 [003] ..s. 87.271899: free_ioctx_reqs <-percpu_ref_kill_rcu
fio-795 [000] .... 87.271904: SyS_io_destroy: 000000007a038000 done
ksoftirqd/3-19 [003] ..s. 87.271905: aio_confirm_reqs: 00000000796c8000 reqs=1
ksoftirqd/3-19 [003] ..s. 87.271905: free_ioctx_reqs <-percpu_ref_kill_rcu
kworker/3:2-222 [003] .... 87.271909: free_ioctx <-process_one_work
kworker/3:2-222 [003] .... 87.271912: free_ioctx <-process_one_work
fio-788 [003] .... 87.271920: SyS_io_destroy: 00000000796c8000 done
Hm..this time 5 additional refs. Again I've never worked with these percpu
refs but would it make sense to look for gets after kills (or is this
valid)?
diff --git a/fs/aio.c b/fs/aio.c
index a0ed6c7..6f83c11 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -521,6 +521,13 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
schedule_work(&ctx->free_work);
}
+void aio_confirm_reqs(struct percpu_ref *ref)
+{
+ struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
+
+ trace_printk("%p reqs=%d\n", ctx, atomic_read(&ref->count));
+}
+
/*
* When this function runs, the kioctx has been removed from the "hash table"
* and ctx->users has dropped to 0, so we know no more kiocbs can be submitted -
@@ -543,7 +550,8 @@ static void free_ioctx_users(struct percpu_ref *ref)
spin_unlock_irq(&ctx->ctx_lock);
- percpu_ref_kill(&ctx->reqs);
+ percpu_ref_kill_and_confirm(&ctx->reqs, aio_confirm_reqs);
+ trace_printk("put_reqs\n");
percpu_ref_put(&ctx->reqs);
}
@@ -695,6 +703,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
spin_unlock(&aio_nr_lock);
percpu_ref_get(&ctx->users); /* io_setup() will drop this ref */
+ trace_printk("get_reqs\n");
percpu_ref_get(&ctx->reqs); /* free_ioctx_users() will drop this */
err = ioctx_add_table(ctx, mm);
@@ -888,6 +897,7 @@ static inline struct kiocb *aio_get_req(struct kioctx *ctx)
if (unlikely(!req))
goto out_put;
+ trace_printk("get_reqs\n");
percpu_ref_get(&ctx->reqs);
req->ki_ctx = ctx;
@@ -1033,6 +1043,7 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
if (waitqueue_active(&ctx->wait))
wake_up(&ctx->wait);
+ trace_printk("put_reqs\n");
percpu_ref_put(&ctx->reqs);
}
EXPORT_SYMBOL(aio_complete);
@@ -1220,6 +1231,8 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
struct completion requests_done =
COMPLETION_INITIALIZER_ONSTACK(requests_done);
+ trace_printk("%p\n", ioctx);
+
/* Pass requests_done to kill_ioctx() where it can be set
* in a thread-safe way. If we try to set it here then we have
* a race condition if two io_destroy() called simultaneously.
@@ -1232,6 +1245,7 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
* is destroyed.
*/
wait_for_completion(&requests_done);
+ trace_printk("%p done\n", ioctx);
return 0;
}
@@ -1457,6 +1471,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
return 0;
out_put_req:
put_reqs_available(ctx, 1);
+ trace_printk("put_reqs\n");
percpu_ref_put(&ctx->reqs);
kiocb_free(req);
return ret;
@@ -1512,6 +1527,10 @@ long do_io_submit(aio_context_t ctx_id, long nr,
blk_finish_plug(&plug);
percpu_ref_put(&ctx->users);
+
+ if (i != nr)
+ pr_warn("nr=%d i=%d\n", nr, i);
+
return i ? i : ret;
}
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-05-21 14:57 ` Sebastian Ott
@ 2014-06-04 5:24 ` Gu Zheng
2014-06-11 8:47 ` Ming Lei
0 siblings, 1 reply; 23+ messages in thread
From: Gu Zheng @ 2014-06-04 5:24 UTC (permalink / raw)
To: Sebastian Ott; +Cc: Anatol Pomozov, Benjamin LaHaise, linux-aio, LKML
Hi Guys,
What's the status of this issue? Has it been fixed?
Thanks,
Gu
On 05/21/2014 10:57 PM, Sebastian Ott wrote:
> On Wed, 21 May 2014, Anatol Pomozov wrote:
>> On Wed, May 21, 2014 at 1:48 AM, Sebastian Ott
>> <sebott@linux.vnet.ibm.com> wrote:
>>> On Wed, 21 May 2014, Sebastian Ott wrote:
>>>> On Tue, 20 May 2014, Anatol Pomozov wrote:
>>>>> On Tue, May 20, 2014 at 11:09 AM, Sebastian Ott
>>>>> <sebott@linux.vnet.ibm.com> wrote:
>>>>>> On Tue, 20 May 2014, Anatol Pomozov wrote:
>>>>>>> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
>>>>>>> <sebott@linux.vnet.ibm.com> wrote:
>>>>>>>> On Tue, 20 May 2014, Sebastian Ott wrote:
>>>>>>>>> On Mon, 19 May 2014, Benjamin LaHaise wrote:
>>>>>>>>>> It is entirely possible the bug isn't
>>>>>>>>>> caused by the referenced commit, as the commit you're pointing to merely
>>>>>>>>>> makes io_destroy() syscall wait for all aio outstanding to complete
>>>>>>>>>> before returning.
>>>>>>>>>
>>>>>>>>> I cannot reproduce this when I revert said commit (on top of 14186fe). If
>>>>>>>>> that matters - the arch is s390.
>>>>>>>>
>>>>>>>> Hm, ok - maybe that commit is really just highlighting a refcounting bug.
>>>>>>>> I just compared traces for a good and a few bad cases. The good case:
>>>>>>>> # tracer: function
>>>>>>>> #
>>>>>>>> # entries-in-buffer/entries-written: 16/16 #P:4
>>>>>>>> #
>>>>>>>> # _-----=> irqs-off
>>>>>>>> # / _----=> need-resched
>>>>>>>> # | / _---=> hardirq/softirq
>>>>>>>> # || / _--=> preempt-depth
>>>>>>>> # ||| / delay
>>>>>>>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>>>>>>>> # | | | |||| | |
>>>>>>>> fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
>>>>>>>> fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
>>>>>>>> ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>> ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>> fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
>>>>>>>> ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
>>>>>>>> ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
>>>>>>>> kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
>>>>>>>> kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
>>>>>>>> ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>> fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
>>>>>>>> <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
>>>>>>>> kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
>>>>>>>> ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>> <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
>>>>>>>> kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
>>>>>>>>
>>>>>>>> (4 fio workers, free_ioctx_reqs is called 4 times)
>>>>>>>>
>>>>>>>> One of the bad cases:
>>>>>>>> # tracer: function
>>>>>>>> #
>>>>>>>> # entries-in-buffer/entries-written: 14/14 #P:4
>>>>>>>> #
>>>>>>>> # _-----=> irqs-off
>>>>>>>> # / _----=> need-resched
>>>>>>>> # | / _---=> hardirq/softirq
>>>>>>>> # || / _--=> preempt-depth
>>>>>>>> # ||| / delay
>>>>>>>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>>>>>>>> # | | | |||| | |
>>>>>>>> fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
>>>>>>>> <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>> fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
>>>>>>>> fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>> <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
>>>>>>>> kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
>>>>>>>> <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
>>>>>>>> kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
>>>>>>>> fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
>>>>>>>> udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>> <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
>>>>>>>> kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
>>>>>>>> fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
>>>>>>>> ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
>>>>>>>>
>>>>>>>> (1 fio worker in D state, free_ioctx_reqs is called 3 times)
>>>>>>>
>>>>>>>
>>>>>>> Looking at the second trace: the first 3 io_destroy() calls cause
>>>>>>> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
>>>>>>> Do you have more logs after the last line?
>>>>>>
>>>>>> Nope that was all.
>>>>>>
>>>>>>> If there is no more
>>>>>>> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
>>>>>>> I suggest to add some logging to kernel to figure out what is the
>>>>>>> refcount value at this moment.
>>>>>>
>>>>>> Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
>>>>>> free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
>>>>>> the value was always the same
>>>>>> 1 + (1UL << 31)
>>>>>> even for the free_ioctx_users invocations that were not followed by
>>>>>> free_ioctx_reqs.
>>>>>
>>>>> Could you add atomic_read(&ctx->reqs.count) *after* the
>>>>> percpu_ref_kill(&ctx->reqs)?
>>>>
>>>> I already did that and it didn't change, always 1 + (1UL << 31) in all
>>>> cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
>>>> familiar with this percpu_ref stuff but it looks like the initial
>>>> reference is dropped asynchronous.
>>>
>>>
>>> cat /sys/kernel/debug/tracing/trace
>>> # tracer: function
>>> #
>>> # entries-in-buffer/entries-written: 25/25 #P:4
>>> #
>>> # _-----=> irqs-off
>>> # / _----=> need-resched
>>> # | / _---=> hardirq/softirq
>>> # || / _--=> preempt-depth
>>> # ||| / delay
>>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>>> # | | | |||| | |
>>> fio-856 [001] .... 160.876428: SyS_io_destroy: 0000000074a18000
>>> fio-856 [001] .... 160.876430: kill_ioctx <-SyS_io_destroy
>>> fio-855 [000] .... 160.887737: SyS_io_destroy: 0000000074f40600
>>> fio-855 [000] .... 160.887738: kill_ioctx <-SyS_io_destroy
>>> fio-849 [001] ..s. 160.911948: free_ioctx_users <-percpu_ref_kill_rcu
>>> ksoftirqd/0-3 [000] ..s. 160.932488: free_ioctx_users <-percpu_ref_kill_rcu
>>> fio-854 [001] .... 160.938881: SyS_io_destroy: 0000000074ac0600
>>> fio-854 [001] .... 160.938881: kill_ioctx <-SyS_io_destroy
>>> ksoftirqd/1-11 [001] ..s. 160.942016: aio_confirm_reqs: 0000000074a18000 reqs=1
>>> ksoftirqd/1-11 [001] ..s. 160.942016: free_ioctx_reqs <-percpu_ref_kill_rcu
>>> kworker/1:2-465 [001] .... 160.942021: free_ioctx <-process_one_work
>>> fio-856 [002] .... 160.942033: SyS_io_destroy: 0000000074a18000 done
>>> fio-849 [002] .... 160.942641: SyS_io_destroy: 0000000074f28600
>>> fio-849 [002] .... 160.942641: kill_ioctx <-SyS_io_destroy
>>> ksoftirqd/1-11 [001] ..s. 160.961981: free_ioctx_users <-percpu_ref_kill_rcu
>>> <idle>-0 [000] .ns. 160.962010: aio_confirm_reqs: 0000000074f40600 reqs=1
>>> <idle>-0 [000] .ns. 160.962011: free_ioctx_reqs <-percpu_ref_kill_rcu
>>> kworker/0:0-4 [000] .... 160.962016: free_ioctx <-process_one_work
>>> fio-855 [001] .... 160.962017: SyS_io_destroy: 0000000074f40600 done
>>> ksoftirqd/2-15 [002] ..s. 160.971998: free_ioctx_users <-percpu_ref_kill_rcu
>>> ksoftirqd/1-11 [001] ..s. 160.994552: aio_confirm_reqs: 0000000074ac0600 reqs=2
>>
>> Here it is. aio context 0000000074ac0600 has refcount == 2 (one for
>> initial refcount and one grabbed by someone). You need to find the one
>> who grabbed the refcount and figure out why it does not drop it.
>
> I (hopefully) traced all gets and puts (of ctx->reqs). Hm gets and puts
> are equal in numbers:
>
> cat /sys/kernel/debug/tracing/trace | grep aio_complete | wc
> 49120 294720 3487520
> cat /sys/kernel/debug/tracing/trace | grep do_io_submit | wc
> 49120 294720 3487520
>
> cat /sys/kernel/debug/tracing/trace | grep -v aio_complete | grep -v do_io_submit
> # tracer: function
> #
> # entries-in-buffer/entries-written: 98273/98273 #P:4
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> fio-793 [002] .... 86.697232: SyS_io_setup: get_reqs
> fio-794 [000] .... 86.697232: SyS_io_setup: get_reqs
> fio-788 [003] .... 86.697235: SyS_io_setup: get_reqs
> fio-795 [001] .... 86.697237: SyS_io_setup: get_reqs
> fio-793 [000] .... 87.217093: SyS_io_destroy: 00000000796d0000
> fio-793 [000] .... 87.217095: kill_ioctx <-SyS_io_destroy
> fio-795 [003] .... 87.221408: SyS_io_destroy: 000000007a038000
> fio-795 [003] .... 87.221409: kill_ioctx <-SyS_io_destroy
> fio-788 [003] .... 87.226725: SyS_io_destroy: 00000000796c8000
> fio-788 [003] .... 87.226726: kill_ioctx <-SyS_io_destroy
> fio-794 [000] .... 87.227840: SyS_io_destroy: 0000000079420000
> fio-794 [000] .... 87.227841: kill_ioctx <-SyS_io_destroy
> ksoftirqd/3-19 [003] ..s. 87.244201: free_ioctx_users <-percpu_ref_kill_rcu
> ksoftirqd/3-19 [003] ..s. 87.244201: free_ioctx_users: put_reqs
> ksoftirqd/3-19 [003] ..s. 87.244205: free_ioctx_users <-percpu_ref_kill_rcu
> ksoftirqd/3-19 [003] ..s. 87.244205: free_ioctx_users: put_reqs
> ksoftirqd/0-3 [000] ..s. 87.251898: free_ioctx_users <-percpu_ref_kill_rcu
> ksoftirqd/0-3 [000] ..s. 87.251898: free_ioctx_users: put_reqs
> ksoftirqd/0-3 [000] ..s. 87.251903: free_ioctx_users <-percpu_ref_kill_rcu
> ksoftirqd/0-3 [000] ..s. 87.251903: free_ioctx_users: put_reqs
> ksoftirqd/0-3 [000] ..s. 87.271868: aio_confirm_reqs: 00000000796d0000 reqs=1
> ksoftirqd/0-3 [000] ..s. 87.271868: free_ioctx_reqs <-percpu_ref_kill_rcu
> ksoftirqd/0-3 [000] ..s. 87.271874: aio_confirm_reqs: 0000000079420000 reqs=6
> kworker/0:1-53 [000] .... 87.271875: free_ioctx <-process_one_work
> fio-793 [001] .... 87.271879: SyS_io_destroy: 00000000796d0000 done
> ksoftirqd/3-19 [003] ..s. 87.271899: aio_confirm_reqs: 000000007a038000 reqs=1
> ksoftirqd/3-19 [003] ..s. 87.271899: free_ioctx_reqs <-percpu_ref_kill_rcu
> fio-795 [000] .... 87.271904: SyS_io_destroy: 000000007a038000 done
> ksoftirqd/3-19 [003] ..s. 87.271905: aio_confirm_reqs: 00000000796c8000 reqs=1
> ksoftirqd/3-19 [003] ..s. 87.271905: free_ioctx_reqs <-percpu_ref_kill_rcu
> kworker/3:2-222 [003] .... 87.271909: free_ioctx <-process_one_work
> kworker/3:2-222 [003] .... 87.271912: free_ioctx <-process_one_work
> fio-788 [003] .... 87.271920: SyS_io_destroy: 00000000796c8000 done
>
> Hm..this time 5 additional refs. Again I've never worked with these percpu
> refs but would it make sense to look for gets after kills (or is this
> valid)?
>
> diff --git a/fs/aio.c b/fs/aio.c
> index a0ed6c7..6f83c11 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -521,6 +521,13 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
> schedule_work(&ctx->free_work);
> }
>
> +void aio_confirm_reqs(struct percpu_ref *ref)
> +{
> + struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
> +
> + trace_printk("%p reqs=%d\n", ctx, atomic_read(&ref->count));
> +}
> +
> /*
> * When this function runs, the kioctx has been removed from the "hash table"
> * and ctx->users has dropped to 0, so we know no more kiocbs can be submitted -
> @@ -543,7 +550,8 @@ static void free_ioctx_users(struct percpu_ref *ref)
>
> spin_unlock_irq(&ctx->ctx_lock);
>
> - percpu_ref_kill(&ctx->reqs);
> + percpu_ref_kill_and_confirm(&ctx->reqs, aio_confirm_reqs);
> + trace_printk("put_reqs\n");
> percpu_ref_put(&ctx->reqs);
> }
>
> @@ -695,6 +703,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
> spin_unlock(&aio_nr_lock);
>
> percpu_ref_get(&ctx->users); /* io_setup() will drop this ref */
> + trace_printk("get_reqs\n");
> percpu_ref_get(&ctx->reqs); /* free_ioctx_users() will drop this */
>
> err = ioctx_add_table(ctx, mm);
> @@ -888,6 +897,7 @@ static inline struct kiocb *aio_get_req(struct kioctx *ctx)
> if (unlikely(!req))
> goto out_put;
>
> + trace_printk("get_reqs\n");
> percpu_ref_get(&ctx->reqs);
>
> req->ki_ctx = ctx;
> @@ -1033,6 +1043,7 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
> if (waitqueue_active(&ctx->wait))
> wake_up(&ctx->wait);
>
> + trace_printk("put_reqs\n");
> percpu_ref_put(&ctx->reqs);
> }
> EXPORT_SYMBOL(aio_complete);
> @@ -1220,6 +1231,8 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
> struct completion requests_done =
> COMPLETION_INITIALIZER_ONSTACK(requests_done);
>
> + trace_printk("%p\n", ioctx);
> +
> /* Pass requests_done to kill_ioctx() where it can be set
> * in a thread-safe way. If we try to set it here then we have
> * a race condition if two io_destroy() called simultaneously.
> @@ -1232,6 +1245,7 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
> * is destroyed.
> */
> wait_for_completion(&requests_done);
> + trace_printk("%p done\n", ioctx);
>
> return 0;
> }
> @@ -1457,6 +1471,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> return 0;
> out_put_req:
> put_reqs_available(ctx, 1);
> + trace_printk("put_reqs\n");
> percpu_ref_put(&ctx->reqs);
> kiocb_free(req);
> return ret;
> @@ -1512,6 +1527,10 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> blk_finish_plug(&plug);
>
> percpu_ref_put(&ctx->users);
> +
> + if (i != nr)
> + pr_warn("nr=%d i=%d\n", nr, i);
> +
> return i ? i : ret;
> }
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org. For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
> .
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process)
2014-05-21 14:18 ` Anatol Pomozov
2014-05-21 14:57 ` Sebastian Ott
@ 2014-06-04 13:58 ` Sebastian Ott
2014-06-04 16:11 ` Tejun Heo
2014-06-04 16:41 ` Tejun Heo
1 sibling, 2 replies; 23+ messages in thread
From: Sebastian Ott @ 2014-06-04 13:58 UTC (permalink / raw)
To: Anatol Pomozov
Cc: Benjamin LaHaise, linux-aio, Kent Overstreet, Tejun Heo, Gu Zheng,
Heiko Carstens, LKML
Hello,
On Wed, 21 May 2014, Anatol Pomozov wrote:
> Hi
>
> On Wed, May 21, 2014 at 1:48 AM, Sebastian Ott
> <sebott@linux.vnet.ibm.com> wrote:
> > On Wed, 21 May 2014, Sebastian Ott wrote:
> >> On Tue, 20 May 2014, Anatol Pomozov wrote:
> >> > On Tue, May 20, 2014 at 11:09 AM, Sebastian Ott
> >> > <sebott@linux.vnet.ibm.com> wrote:
> >> > > On Tue, 20 May 2014, Anatol Pomozov wrote:
> >> > >> On Tue, May 20, 2014 at 6:16 AM, Sebastian Ott
> >> > >> <sebott@linux.vnet.ibm.com> wrote:
> >> > >> > On Tue, 20 May 2014, Sebastian Ott wrote:
> >> > >> >> On Mon, 19 May 2014, Benjamin LaHaise wrote:
> >> > >> >> > It is entirely possible the bug isn't
> >> > >> >> > caused by the referenced commit, as the commit you're pointing to merely
> >> > >> >> > makes io_destroy() syscall wait for all aio outstanding to complete
> >> > >> >> > before returning.
> >> > >> >>
> >> > >> >> I cannot reproduce this when I revert said commit (on top of 14186fe). If
> >> > >> >> that matters - the arch is s390.
> >> > >> >
> >> > >> > Hm, ok - maybe that commit is really just highlighting a refcounting bug.
> >> > >> > I just compared traces for a good and a few bad cases. The good case:
> >> > >> > # tracer: function
> >> > >> > #
> >> > >> > # entries-in-buffer/entries-written: 16/16 #P:4
> >> > >> > #
> >> > >> > # _-----=> irqs-off
> >> > >> > # / _----=> need-resched
> >> > >> > # | / _---=> hardirq/softirq
> >> > >> > # || / _--=> preempt-depth
> >> > >> > # ||| / delay
> >> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> >> > >> > # | | | |||| | |
> >> > >> > fio-732 [003] .... 17.989315: kill_ioctx <-SyS_io_destroy
> >> > >> > fio-739 [003] .... 18.000563: kill_ioctx <-SyS_io_destroy
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.031673: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.031679: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > fio-737 [003] .... 18.038765: kill_ioctx <-SyS_io_destroy
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.062488: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.062494: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/3:1-57 [003] .... 18.062499: free_ioctx <-process_one_work
> >> > >> > kworker/3:1-57 [003] .... 18.062506: free_ioctx <-process_one_work
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.072275: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > fio-738 [003] .... 18.102419: kill_ioctx <-SyS_io_destroy
> >> > >> > <idle>-0 [003] .ns. 18.111668: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/3:1-57 [003] .... 18.111675: free_ioctx <-process_one_work
> >> > >> > ksoftirqd/3-19 [003] ..s. 18.138035: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > <idle>-0 [003] .ns. 18.191665: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/3:1-57 [003] .... 18.191671: free_ioctx <-process_one_work
> >> > >> >
> >> > >> > (4 fio workers, free_ioctx_reqs is called 4 times)
> >> > >> >
> >> > >> > One of the bad cases:
> >> > >> > # tracer: function
> >> > >> > #
> >> > >> > # entries-in-buffer/entries-written: 14/14 #P:4
> >> > >> > #
> >> > >> > # _-----=> irqs-off
> >> > >> > # / _----=> need-resched
> >> > >> > # | / _---=> hardirq/softirq
> >> > >> > # || / _--=> preempt-depth
> >> > >> > # ||| / delay
> >> > >> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> >> > >> > # | | | |||| | |
> >> > >> > fio-834 [000] .... 51.127359: kill_ioctx <-SyS_io_destroy
> >> > >> > <idle>-0 [000] ..s. 51.170237: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > fio-828 [001] .... 51.189717: kill_ioctx <-SyS_io_destroy
> >> > >> > fio-833 [001] ..s. 51.220178: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > <idle>-0 [000] .ns. 51.220230: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/0:3-661 [000] .... 51.220238: free_ioctx <-process_one_work
> >> > >> > <idle>-0 [001] .ns. 51.260188: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/1:2-103 [001] .... 51.260198: free_ioctx <-process_one_work
> >> > >> > fio-833 [002] .... 51.287602: kill_ioctx <-SyS_io_destroy
> >> > >> > udevd-868 [002] ..s1 51.332519: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> > <idle>-0 [002] .ns. 51.450180: free_ioctx_reqs <-percpu_ref_kill_rcu
> >> > >> > kworker/2:2-191 [002] .... 51.450191: free_ioctx <-process_one_work
> >> > >> > fio-835 [003] .... 51.907530: kill_ioctx <-SyS_io_destroy
> >> > >> > ksoftirqd/3-19 [003] ..s. 52.000232: free_ioctx_users <-percpu_ref_kill_rcu
> >> > >> >
> >> > >> > (1 fio worker in D state, free_ioctx_reqs is called 3 times)
> >> > >>
> >> > >>
> >> > >> Looking at the second trace: the first 3 io_destroy() calls cause
> >> > >> free_ioctx_reqs(), but the last one does not call free_ioctx_reqs().
> >> > >> Do you have more logs after the last line?
> >> > >
> >> > > Nope that was all.
> >> > >
> >> > >> If there is no more
> >> > >> free_ioctx_reqs() then it means something keeps ctx->reqs refcounter.
> >> > >> I suggest to add some logging to kernel to figure out what is the
> >> > >> refcount value at this moment.
> >> > >
> >> > > Jep, I did that this morning (via atomic_read(&ctx->reqs.count) in
> >> > > free_ioctx_users before percpu_ref_kill(&ctx->reqs); is called) and
> >> > > the value was always the same
> >> > > 1 + (1UL << 31)
> >> > > even for the free_ioctx_users invocations that were not followed by
> >> > > free_ioctx_reqs.
> >> >
> >> > Could you add atomic_read(&ctx->reqs.count) *after* the
> >> > percpu_ref_kill(&ctx->reqs)?
> >>
> >> I already did that and it didn't change, always 1 + (1UL << 31) in all
> >> cases, before and after percpu_ref_kill(&ctx->reqs). I'm not really
> >> familiar with this percpu_ref stuff but it looks like the initial
> >> reference is dropped asynchronous.
> >
> >
> > cat /sys/kernel/debug/tracing/trace
> > # tracer: function
> > #
> > # entries-in-buffer/entries-written: 25/25 #P:4
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > fio-856 [001] .... 160.876428: SyS_io_destroy: 0000000074a18000
> > fio-856 [001] .... 160.876430: kill_ioctx <-SyS_io_destroy
> > fio-855 [000] .... 160.887737: SyS_io_destroy: 0000000074f40600
> > fio-855 [000] .... 160.887738: kill_ioctx <-SyS_io_destroy
> > fio-849 [001] ..s. 160.911948: free_ioctx_users <-percpu_ref_kill_rcu
> > ksoftirqd/0-3 [000] ..s. 160.932488: free_ioctx_users <-percpu_ref_kill_rcu
> > fio-854 [001] .... 160.938881: SyS_io_destroy: 0000000074ac0600
> > fio-854 [001] .... 160.938881: kill_ioctx <-SyS_io_destroy
> > ksoftirqd/1-11 [001] ..s. 160.942016: aio_confirm_reqs: 0000000074a18000 reqs=1
> > ksoftirqd/1-11 [001] ..s. 160.942016: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/1:2-465 [001] .... 160.942021: free_ioctx <-process_one_work
> > fio-856 [002] .... 160.942033: SyS_io_destroy: 0000000074a18000 done
> > fio-849 [002] .... 160.942641: SyS_io_destroy: 0000000074f28600
> > fio-849 [002] .... 160.942641: kill_ioctx <-SyS_io_destroy
> > ksoftirqd/1-11 [001] ..s. 160.961981: free_ioctx_users <-percpu_ref_kill_rcu
> > <idle>-0 [000] .ns. 160.962010: aio_confirm_reqs: 0000000074f40600 reqs=1
> > <idle>-0 [000] .ns. 160.962011: free_ioctx_reqs <-percpu_ref_kill_rcu
> > kworker/0:0-4 [000] .... 160.962016: free_ioctx <-process_one_work
> > fio-855 [001] .... 160.962017: SyS_io_destroy: 0000000074f40600 done
> > ksoftirqd/2-15 [002] ..s. 160.971998: free_ioctx_users <-percpu_ref_kill_rcu
> > ksoftirqd/1-11 [001] ..s. 160.994552: aio_confirm_reqs: 0000000074ac0600 reqs=2
>
> Here it is. aio context 0000000074ac0600 has refcount == 2 (one for
> initial refcount and one grabbed by someone). You need to find the one
> who grabbed the refcount and figure out why it does not drop it.
I did some more debugging with a minimal testcase using just one cpu. The
traces showed some totally wrong values in the percpu refcounter when
switching between softirq and process context happens. Heiko found and
fixed the bug leading to that.
Regards,
Sebastian
>From 82295633cad58c7d6b9af4e470e3168ed43a6779 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Date: Wed, 4 Jun 2014 12:53:19 +0200
Subject: [PATCH] percpu-refcount: fix usage of this_cpu_ops
The percpu-refcount infrastructure uses the underscore variants of
this_cpu_ops in order to modify percpu reference counters.
(e.g. __this_cpu_inc()).
However the underscore variants do not atomically update the percpu
variable, instead they may be implemented using read-modify-write
semantics (more than one instruction). Therefore it is only safe to
use the underscore variant if the context is always the same (process,
softirq, or hardirq). Otherwise it is possible to lose updates.
This problem is something that Sebastian has seen within the aio
subsystem which uses percpu refcounters both in process and softirq
context leading to reference counts that never dropped to zeroes; even
though the number of "get" and "put" calls matched.
Fix this by using the non-underscore this_cpu_ops variant which
provides correct per cpu atomic semantics and fixes the corrupted
reference counts.
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org> # v3.11+
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
include/linux/percpu-refcount.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 95961f0bf62d..0afb48fd449d 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -110,7 +110,7 @@ static inline void percpu_ref_get(struct percpu_ref *ref)
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR))
- __this_cpu_inc(*pcpu_count);
+ this_cpu_inc(*pcpu_count);
else
atomic_inc(&ref->count);
@@ -139,7 +139,7 @@ static inline bool percpu_ref_tryget(struct percpu_ref *ref)
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
- __this_cpu_inc(*pcpu_count);
+ this_cpu_inc(*pcpu_count);
ret = true;
}
@@ -164,7 +164,7 @@ static inline void percpu_ref_put(struct percpu_ref *ref)
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR))
- __this_cpu_dec(*pcpu_count);
+ this_cpu_dec(*pcpu_count);
else if (unlikely(atomic_dec_and_test(&ref->count)))
ref->release(ref);
--
1.8.5.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process)
2014-06-04 13:58 ` [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process) Sebastian Ott
@ 2014-06-04 16:11 ` Tejun Heo
2014-06-04 16:56 ` Sebastian Ott
2014-06-04 16:41 ` Tejun Heo
1 sibling, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-06-04 16:11 UTC (permalink / raw)
To: Sebastian Ott
Cc: Anatol Pomozov, Benjamin LaHaise, linux-aio, Kent Overstreet,
Gu Zheng, Heiko Carstens, LKML
On Wed, Jun 04, 2014 at 03:58:24PM +0200, Sebastian Ott wrote:
> From 82295633cad58c7d6b9af4e470e3168ed43a6779 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> Date: Wed, 4 Jun 2014 12:53:19 +0200
> Subject: [PATCH] percpu-refcount: fix usage of this_cpu_ops
>
> The percpu-refcount infrastructure uses the underscore variants of
> this_cpu_ops in order to modify percpu reference counters.
> (e.g. __this_cpu_inc()).
>
> However the underscore variants do not atomically update the percpu
> variable, instead they may be implemented using read-modify-write
> semantics (more than one instruction). Therefore it is only safe to
> use the underscore variant if the context is always the same (process,
> softirq, or hardirq). Otherwise it is possible to lose updates.
>
> This problem is something that Sebastian has seen within the aio
> subsystem which uses percpu refcounters both in process and softirq
> context leading to reference counts that never dropped to zeroes; even
> though the number of "get" and "put" calls matched.
>
> Fix this by using the non-underscore this_cpu_ops variant which
> provides correct per cpu atomic semantics and fixes the corrupted
> reference counts.
>
> Cc: Kent Overstreet <kmo@daterainc.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: <stable@vger.kernel.org> # v3.11+
> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Bah... should have caught it way back. Sorry about that. Appying to
percpu/for-3.15-fixes with stable cc'd.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process)
2014-06-04 13:58 ` [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process) Sebastian Ott
2014-06-04 16:11 ` Tejun Heo
@ 2014-06-04 16:41 ` Tejun Heo
2014-06-04 17:17 ` Christoph Lameter
1 sibling, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-06-04 16:41 UTC (permalink / raw)
To: Sebastian Ott
Cc: Anatol Pomozov, Benjamin LaHaise, linux-aio, Kent Overstreet,
Gu Zheng, Heiko Carstens, LKML, Christoph Lameter
cc'ing Christoph. Hey!
On Wed, Jun 04, 2014 at 03:58:24PM +0200, Sebastian Ott wrote:
> From 82295633cad58c7d6b9af4e470e3168ed43a6779 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> Date: Wed, 4 Jun 2014 12:53:19 +0200
> Subject: [PATCH] percpu-refcount: fix usage of this_cpu_ops
>
> The percpu-refcount infrastructure uses the underscore variants of
> this_cpu_ops in order to modify percpu reference counters.
> (e.g. __this_cpu_inc()).
>
> However the underscore variants do not atomically update the percpu
> variable, instead they may be implemented using read-modify-write
> semantics (more than one instruction). Therefore it is only safe to
> use the underscore variant if the context is always the same (process,
> softirq, or hardirq). Otherwise it is possible to lose updates.
>
> This problem is something that Sebastian has seen within the aio
> subsystem which uses percpu refcounters both in process and softirq
> context leading to reference counts that never dropped to zeroes; even
> though the number of "get" and "put" calls matched.
>
> Fix this by using the non-underscore this_cpu_ops variant which
> provides correct per cpu atomic semantics and fixes the corrupted
> reference counts.
Christoph, percpu-refcount misused __this_cpu_*() and subtly broke
s390 which uses the stock read-modify-write implementation. It should
be possible to annotate __this_cpu_*() so that lockdep warns if it's
used from different contexts, right? Hmm.... now that I think about
it, there's nothing to attach lockdep context to. :(
Urgh... I really don't like the subtleties around __this_cpu_*().
It's too easy to get it wrong and fail to notice it. :(
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process)
2014-06-04 16:11 ` Tejun Heo
@ 2014-06-04 16:56 ` Sebastian Ott
2014-06-04 17:02 ` Tejun Heo
0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Ott @ 2014-06-04 16:56 UTC (permalink / raw)
To: Tejun Heo
Cc: Anatol Pomozov, Benjamin LaHaise, linux-aio, Kent Overstreet,
Gu Zheng, Heiko Carstens, LKML
Hi,
On Wed, 4 Jun 2014, Tejun Heo wrote:
> On Wed, Jun 04, 2014 at 03:58:24PM +0200, Sebastian Ott wrote:
> > From 82295633cad58c7d6b9af4e470e3168ed43a6779 Mon Sep 17 00:00:00 2001
> > From: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Date: Wed, 4 Jun 2014 12:53:19 +0200
> > Subject: [PATCH] percpu-refcount: fix usage of this_cpu_ops
> >
> > The percpu-refcount infrastructure uses the underscore variants of
> > this_cpu_ops in order to modify percpu reference counters.
> > (e.g. __this_cpu_inc()).
> >
> > However the underscore variants do not atomically update the percpu
> > variable, instead they may be implemented using read-modify-write
> > semantics (more than one instruction). Therefore it is only safe to
> > use the underscore variant if the context is always the same (process,
> > softirq, or hardirq). Otherwise it is possible to lose updates.
> >
> > This problem is something that Sebastian has seen within the aio
> > subsystem which uses percpu refcounters both in process and softirq
> > context leading to reference counts that never dropped to zeroes; even
> > though the number of "get" and "put" calls matched.
> >
> > Fix this by using the non-underscore this_cpu_ops variant which
> > provides correct per cpu atomic semantics and fixes the corrupted
> > reference counts.
> >
> > Cc: Kent Overstreet <kmo@daterainc.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: <stable@vger.kernel.org> # v3.11+
> > Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> Bah... should have caught it way back. Sorry about that. Appying to
> percpu/for-3.15-fixes with stable cc'd.
>
In your git tree you've used me as the patch author but that should be
Heiko. Sorry if I didn't annotate this correctly - I just inlined the
git format-patch output..
Thanks,
Sebastian
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process)
2014-06-04 16:56 ` Sebastian Ott
@ 2014-06-04 17:02 ` Tejun Heo
0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2014-06-04 17:02 UTC (permalink / raw)
To: Sebastian Ott
Cc: Anatol Pomozov, Benjamin LaHaise, linux-aio, Kent Overstreet,
Gu Zheng, Heiko Carstens, LKML
On Wed, Jun 04, 2014 at 06:56:14PM +0200, Sebastian Ott wrote:
> In your git tree you've used me as the patch author but that should be
> Heiko. Sorry if I didn't annotate this correctly - I just inlined the
> git format-patch output..
Oops, updated accordingly.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process)
2014-06-04 16:41 ` Tejun Heo
@ 2014-06-04 17:17 ` Christoph Lameter
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2014-06-04 17:17 UTC (permalink / raw)
To: Tejun Heo
Cc: Sebastian Ott, Anatol Pomozov, Benjamin LaHaise, linux-aio,
Kent Overstreet, Gu Zheng, Heiko Carstens, LKML
On Wed, 4 Jun 2014, Tejun Heo wrote:
> Urgh... I really don't like the subtleties around __this_cpu_*().
> It's too easy to get it wrong and fail to notice it. :(
Thats why we added the preemption check logic recently. Wonder why it did
not trigger in this case.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: hanging aio process
2014-06-04 5:24 ` Gu Zheng
@ 2014-06-11 8:47 ` Ming Lei
0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2014-06-11 8:47 UTC (permalink / raw)
To: Gu Zheng; +Cc: Sebastian Ott, Anatol Pomozov, Benjamin LaHaise, linux-aio, LKML
On Wed, Jun 4, 2014 at 1:24 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> Hi Guys,
> What's the status of this issue? Has it been fixed?
Yes, it is fixed by percpu_ref, and is merged to 3.15 already.
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2014-06-11 8:48 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-19 17:38 hanging aio process Sebastian Ott
2014-05-19 18:08 ` Benjamin LaHaise
2014-05-20 8:08 ` Sebastian Ott
2014-05-20 13:16 ` Sebastian Ott
2014-05-20 14:26 ` Benjamin LaHaise
2014-05-20 16:12 ` Anatol Pomozov
2014-05-20 18:09 ` Sebastian Ott
2014-05-20 18:39 ` Anatol Pomozov
2014-05-21 8:07 ` Sebastian Ott
2014-05-21 8:48 ` Sebastian Ott
2014-05-21 13:20 ` Benjamin LaHaise
2014-05-21 14:12 ` Sebastian Ott
2014-05-21 14:18 ` Anatol Pomozov
2014-05-21 14:57 ` Sebastian Ott
2014-06-04 5:24 ` Gu Zheng
2014-06-11 8:47 ` Ming Lei
2014-06-04 13:58 ` [PATCH] percpu-refcount: fix usage of this_cpu_ops (was Re: hanging aio process) Sebastian Ott
2014-06-04 16:11 ` Tejun Heo
2014-06-04 16:56 ` Sebastian Ott
2014-06-04 17:02 ` Tejun Heo
2014-06-04 16:41 ` Tejun Heo
2014-06-04 17:17 ` Christoph Lameter
2014-05-19 18:31 ` hanging aio process Anatol Pomozov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox