diff for duplicates of <4679671.be02cq1OFG@merkaba> diff --git a/a/1.txt b/N1/1.txt index 8d51eba..34c0d4d 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -2,85 +2,68 @@ Hello Oleksandr, Oleksandr Natalenko - 26.08.17, 12:48: > Quick update: reproduced on both v4.12.7 and v4.13.0-rc6. ->=20 +> > On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote: -[=E2=80=A6] +[…] > > I've re-checked this issue with 4.12.9, and it is still there. -[=E2=80=A6] -> > On =C3=BAter=C3=BD 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wro= -te: +[…] +> > On úterý 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote: > > > Hi. -> > >=20 +> > > > > > v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I > > > have -> > > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the sa= -me +> > > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same > > > result. Soft lockup happens showing stacktraces I'm pasting below. -I did have occassional hangs on resuming from suspend-to-ram with 4.12, but= -=E2=80=A6 I=20 +I did have occassional hangs on resuming from suspend-to-ram with 4.12, but… I am not certain that its related to I/O issues. -The hangs were gone as I tried with a kernel config from a friend that stil= -l=20 -uses CFQ, I then adapted it to use 1000HZ, low-latency desktop and blk-mq a= -s I=20 -used before and also enabled optimization for the processor type in this=20 -ThinkPad T520, and got hangs on resuming from suspend-to-ram again.=20 +The hangs were gone as I tried with a kernel config from a friend that still +uses CFQ, I then adapted it to use 1000HZ, low-latency desktop and blk-mq as I +used before and also enabled optimization for the processor type in this +ThinkPad T520, and got hangs on resuming from suspend-to-ram again. -As backing out the change from 250 HZ to 1000 HZ, and from low-latency desk= -top=20 -to the lesser low latency option did not help, I am now currently re-using = -the=20 -config from my friend minus quite some drivers and unneeded kernel features= -,=20 -but otherwise almost unchanged. I.e. back with CFQ as well. So far so good,= -=20 +As backing out the change from 250 HZ to 1000 HZ, and from low-latency desktop +to the lesser low latency option did not help, I am now currently re-using the +config from my friend minus quite some drivers and unneeded kernel features, +but otherwise almost unchanged. I.e. back with CFQ as well. So far so good, but it needs at least 4-5 days of additional testing to be sure. -Also=E2=80=A6 when a hang happened the mouse pointer was frozen, Ctrl-Alt-F= -1 didn=C2=B4t=20 -work and so on=E2=80=A6 so it may easily be a completely different issue. +Also… when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1 didn´t +work and so on… so it may easily be a completely different issue. -I did not see much point in reporting it so far=E2=80=A6 as I have no idea = -on how to=20 -reliably pin-point the issue. It happens once every few days, so a bisect=20 -again is out of questions =E2=80=93 (it is anyway for a production machine = -for me) =E2=80=93,=20 -it appears to be a hard freeze, so no debug data=E2=80=A6 its one of these = -"you don=C2=B4t=20 -get to debug me" hangs again. I really have no idea how to a get hold on su= -ch=20 -complexity. I am hoping to at least pin-point the exact kernel option that= -=20 -triggers this issue, but it may take weeks to do so. I=C2=B4d really love a= - way for=20 -the kernel to at least to write out debug data before doing hanging=20 +I did not see much point in reporting it so far… as I have no idea on how to +reliably pin-point the issue. It happens once every few days, so a bisect +again is out of questions – (it is anyway for a production machine for me) –, +it appears to be a hard freeze, so no debug data… its one of these "you don´t +get to debug me" hangs again. I really have no idea how to a get hold on such +complexity. I am hoping to at least pin-point the exact kernel option that +triggers this issue, but it may take weeks to do so. I´d really love a way for +the kernel to at least to write out debug data before doing hanging completely. Thanks, Martin -> > >=20 +> > > > > > Stacktrace shows that I/O hangs in md_super_wait(), and it means it > > > waits -> > > for "all superblock writes that were scheduled to complete". Since th= -ere +> > > for "all superblock writes that were scheduled to complete". Since there > > > is > > > "scheduled" word, should I also try "none" scheduler with blk-mq > > > enabled? -> > >=20 +> > > > > > While I'm trying to reproduce it on a VM without much luck (it happens > > > on > > > my laptop rarely, like 1 out of 10 suspend-resume cycles), and also > > > re-checking it with blk-mq disabled, by any chance is this something > > > already known? -> > >=20 +> > > > > > Ideally, I'd like to reprduce it in a VM and capture vmcore. -> > >=20 +> > > > > > Any suggestions are welcome. Thanks. -> > >=20 -> > > =3D=3D=3D +> > > +> > > === > > > [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120 > > > seconds. > > > [ 9460.165983] Not tainted 4.12.0-pf7 #1 @@ -167,8 +150,7 @@ ere > > > 0000000000000000 > > > [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15: > > > 0000000000000001 -> > > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than = -120 +> > > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120 > > > seconds. > > > [ 9460.167080] Not tainted 4.12.0-pf7 #1 > > > [ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" @@ -203,8 +185,7 @@ ere > > > 0000007b8c7b88f0 > > > [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15: > > > 0000007b8c7b66b0 -> > > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than = -120 +> > > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120 > > > seconds. > > > [ 9460.167492] Not tainted 4.12.0-pf7 #1 > > > [ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" @@ -274,8 +255,7 @@ ere > > > 0000000000000001 > > > [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15: > > > 0000000000000000 -> > > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than = -120 +> > > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120 > > > seconds. > > > [ 9460.168019] Not tainted 4.12.0-pf7 #1 > > > [ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" @@ -310,8 +290,7 @@ ere > > > 00007f808c8b77d8 > > > [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15: > > > 0000398cef877800 -> > > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than = -120 +> > > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120 > > > seconds. > > > [ 9460.168272] Not tainted 4.12.0-pf7 #1 > > > [ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" @@ -349,8 +328,7 @@ ere > > > 00000000004b0000 > > > [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15: > > > 0000398ceb9a1048 -> > > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than = -120 +> > > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120 > > > seconds. > > > [ 9460.168858] Not tainted 4.12.0-pf7 #1 > > > [ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" @@ -385,8 +363,7 @@ ere > > > 00007f807d5f17d8 > > > [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15: > > > 0000398cece75000 -> > > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than = -120 +> > > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120 > > > seconds. > > > [ 9460.169103] Not tainted 4.12.0-pf7 #1 > > > [ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" @@ -427,8 +404,8 @@ ere > > > 0000000000000001 > > > [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15: > > > 0000398ce9b50d80 -> > > =3D=3D=3D +> > > === -=2D-=20 +-- Martin diff --git a/a/content_digest b/N1/content_digest index 4574988..527e1db 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -18,85 +18,68 @@ "\n" "Oleksandr Natalenko - 26.08.17, 12:48:\n" "> Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.\n" - ">=20\n" + "> \n" "> On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote:\n" - "[=E2=80=A6]\n" + "[\342\200\246]\n" "> > I've re-checked this issue with 4.12.9, and it is still there.\n" - "[=E2=80=A6]\n" - "> > On =C3=BAter=C3=BD 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wro=\n" - "te:\n" + "[\342\200\246]\n" + "> > On \303\272ter\303\275 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote:\n" "> > > Hi.\n" - "> > >=20\n" + "> > > \n" "> > > v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I\n" "> > > have\n" - "> > > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the sa=\n" - "me\n" + "> > > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same\n" "> > > result. Soft lockup happens showing stacktraces I'm pasting below.\n" "\n" - "I did have occassional hangs on resuming from suspend-to-ram with 4.12, but=\n" - "=E2=80=A6 I=20\n" + "I did have occassional hangs on resuming from suspend-to-ram with 4.12, but\342\200\246 I \n" "am not certain that its related to I/O issues.\n" "\n" - "The hangs were gone as I tried with a kernel config from a friend that stil=\n" - "l=20\n" - "uses CFQ, I then adapted it to use 1000HZ, low-latency desktop and blk-mq a=\n" - "s I=20\n" - "used before and also enabled optimization for the processor type in this=20\n" - "ThinkPad T520, and got hangs on resuming from suspend-to-ram again.=20\n" + "The hangs were gone as I tried with a kernel config from a friend that still \n" + "uses CFQ, I then adapted it to use 1000HZ, low-latency desktop and blk-mq as I \n" + "used before and also enabled optimization for the processor type in this \n" + "ThinkPad T520, and got hangs on resuming from suspend-to-ram again. \n" "\n" - "As backing out the change from 250 HZ to 1000 HZ, and from low-latency desk=\n" - "top=20\n" - "to the lesser low latency option did not help, I am now currently re-using =\n" - "the=20\n" - "config from my friend minus quite some drivers and unneeded kernel features=\n" - ",=20\n" - "but otherwise almost unchanged. I.e. back with CFQ as well. So far so good,=\n" - "=20\n" + "As backing out the change from 250 HZ to 1000 HZ, and from low-latency desktop \n" + "to the lesser low latency option did not help, I am now currently re-using the \n" + "config from my friend minus quite some drivers and unneeded kernel features, \n" + "but otherwise almost unchanged. I.e. back with CFQ as well. So far so good, \n" "but it needs at least 4-5 days of additional testing to be sure.\n" "\n" - "Also=E2=80=A6 when a hang happened the mouse pointer was frozen, Ctrl-Alt-F=\n" - "1 didn=C2=B4t=20\n" - "work and so on=E2=80=A6 so it may easily be a completely different issue.\n" + "Also\342\200\246 when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1 didn\302\264t \n" + "work and so on\342\200\246 so it may easily be a completely different issue.\n" "\n" - "I did not see much point in reporting it so far=E2=80=A6 as I have no idea =\n" - "on how to=20\n" - "reliably pin-point the issue. It happens once every few days, so a bisect=20\n" - "again is out of questions =E2=80=93 (it is anyway for a production machine =\n" - "for me) =E2=80=93,=20\n" - "it appears to be a hard freeze, so no debug data=E2=80=A6 its one of these =\n" - "\"you don=C2=B4t=20\n" - "get to debug me\" hangs again. I really have no idea how to a get hold on su=\n" - "ch=20\n" - "complexity. I am hoping to at least pin-point the exact kernel option that=\n" - "=20\n" - "triggers this issue, but it may take weeks to do so. I=C2=B4d really love a=\n" - " way for=20\n" - "the kernel to at least to write out debug data before doing hanging=20\n" + "I did not see much point in reporting it so far\342\200\246 as I have no idea on how to \n" + "reliably pin-point the issue. It happens once every few days, so a bisect \n" + "again is out of questions \342\200\223 (it is anyway for a production machine for me) \342\200\223, \n" + "it appears to be a hard freeze, so no debug data\342\200\246 its one of these \"you don\302\264t \n" + "get to debug me\" hangs again. I really have no idea how to a get hold on such \n" + "complexity. I am hoping to at least pin-point the exact kernel option that \n" + "triggers this issue, but it may take weeks to do so. I\302\264d really love a way for \n" + "the kernel to at least to write out debug data before doing hanging \n" "completely.\n" "\n" "Thanks,\n" "Martin\n" "\n" - "> > >=20\n" + "> > > \n" "> > > Stacktrace shows that I/O hangs in md_super_wait(), and it means it\n" "> > > waits\n" - "> > > for \"all superblock writes that were scheduled to complete\". Since th=\n" - "ere\n" + "> > > for \"all superblock writes that were scheduled to complete\". Since there\n" "> > > is\n" "> > > \"scheduled\" word, should I also try \"none\" scheduler with blk-mq\n" "> > > enabled?\n" - "> > >=20\n" + "> > > \n" "> > > While I'm trying to reproduce it on a VM without much luck (it happens\n" "> > > on\n" "> > > my laptop rarely, like 1 out of 10 suspend-resume cycles), and also\n" "> > > re-checking it with blk-mq disabled, by any chance is this something\n" "> > > already known?\n" - "> > >=20\n" + "> > > \n" "> > > Ideally, I'd like to reprduce it in a VM and capture vmcore.\n" - "> > >=20\n" + "> > > \n" "> > > Any suggestions are welcome. Thanks.\n" - "> > >=20\n" - "> > > =3D=3D=3D\n" + "> > > \n" + "> > > ===\n" "> > > [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120\n" "> > > seconds.\n" "> > > [ 9460.165983] Not tainted 4.12.0-pf7 #1\n" @@ -183,8 +166,7 @@ "> > > 0000000000000000\n" "> > > [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:\n" "> > > 0000000000000001\n" - "> > > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than =\n" - "120\n" + "> > > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120\n" "> > > seconds.\n" "> > > [ 9460.167080] Not tainted 4.12.0-pf7 #1\n" "> > > [ 9460.167084] \"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"\n" @@ -219,8 +201,7 @@ "> > > 0000007b8c7b88f0\n" "> > > [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:\n" "> > > 0000007b8c7b66b0\n" - "> > > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than =\n" - "120\n" + "> > > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120\n" "> > > seconds.\n" "> > > [ 9460.167492] Not tainted 4.12.0-pf7 #1\n" "> > > [ 9460.167496] \"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"\n" @@ -290,8 +271,7 @@ "> > > 0000000000000001\n" "> > > [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:\n" "> > > 0000000000000000\n" - "> > > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than =\n" - "120\n" + "> > > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120\n" "> > > seconds.\n" "> > > [ 9460.168019] Not tainted 4.12.0-pf7 #1\n" "> > > [ 9460.168024] \"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"\n" @@ -326,8 +306,7 @@ "> > > 00007f808c8b77d8\n" "> > > [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:\n" "> > > 0000398cef877800\n" - "> > > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than =\n" - "120\n" + "> > > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120\n" "> > > seconds.\n" "> > > [ 9460.168272] Not tainted 4.12.0-pf7 #1\n" "> > > [ 9460.168277] \"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"\n" @@ -365,8 +344,7 @@ "> > > 00000000004b0000\n" "> > > [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:\n" "> > > 0000398ceb9a1048\n" - "> > > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than =\n" - "120\n" + "> > > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120\n" "> > > seconds.\n" "> > > [ 9460.168858] Not tainted 4.12.0-pf7 #1\n" "> > > [ 9460.168863] \"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"\n" @@ -401,8 +379,7 @@ "> > > 00007f807d5f17d8\n" "> > > [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:\n" "> > > 0000398cece75000\n" - "> > > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than =\n" - "120\n" + "> > > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120\n" "> > > seconds.\n" "> > > [ 9460.169103] Not tainted 4.12.0-pf7 #1\n" "> > > [ 9460.169108] \"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"\n" @@ -443,10 +420,10 @@ "> > > 0000000000000001\n" "> > > [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:\n" "> > > 0000398ce9b50d80\n" - "> > > =3D=3D=3D\n" + "> > > ===\n" "\n" "\n" - "=2D-=20\n" + "-- \n" Martin -a8c72429a69c304e3b41ed436110e240e3f91f4fe4cc231dec48b82cbeb1e18b +4934b11d35a9dfa9e18d9d931e12618288ce51be16affd954bc487114d08472c
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.