* Hibernate resume bug around 3,18-rc2 - Full PAT support
@ 2015-11-18 21:43 Vassilis Virvilis
2015-11-19 5:39 ` Juergen Gross
0 siblings, 1 reply; 22+ messages in thread
From: Vassilis Virvilis @ 2015-11-18 21:43 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 8068 bytes --]
Hi,
I have been hit by a hibernate/resume bug. Other people may have too: The following links are consistent with my observations
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494
https://bugs.archlinux.org/task/44807
Some observations:
1) The first few rapid hibernation / resume cycles do not fail.
2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + thunderbird/icedove + Konsole) helps to reproduce and lock up during resume
3) Long hibernation times (overnight) helps to reproduce and lock up during resume
4) For the bad commits (where the lockup during resume takes place) - the image loading during resume is significantly faster. It is fast and then it locks.
How I hit the problem and what I have done:
I am running debian unstable
Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. I upgraded diligently up to 4.2.6 - The problem persists
I added no_console_suspend initcall_debug to the kernel command line - see attached image of the lockup.
I added the drm.debug=0xe but it didn't produce any interesting (ok I know who I am to judge?) and the runs did not have it so I took it out again.
I reproduced with hibernating and resuming back to KDE and or back to text console.
I switched to the VGA console and the resume problem persists.
I started kernel bisection from 3.16 to 3.19 following https://wiki.debian.org/DebianKernel/GitBisect
One month and 25 kernels later see below for the bisect log
I hit some untestable kernel that weren't booting. They were hanging at "Loading ramdisk..." before any actual kernel message.
Looks like the first bad / untestable commit is from Juergen Gross / Thomas Gleixner Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support]
Full disclaimer: I may have fucked up the bisection. Finding bad commits was semi easy - finding good commits needs a run time for 2-3 days.
I would really appreciate some help and directions to nail this down.
Regards
Vassilis Virvilis
bill@localhost:~/Downloads/linux$ git bisect log
git bisect start
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a
# bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34
# good: [53429290a054b30e4683297409fc4627b2592315] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect good 53429290a054b30e4683297409fc4627b2592315
# good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51
# bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3
# good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag 'defconfig-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 151cd97630f87451cab412e40750d0e5f7581c98
# good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729
# bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc
# good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124
# bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a023748d53c10850650fe86b1c4a7d421d576451
# good: [773fed910d41e443e495a6bfa9ab1c2b7b13e012] Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 773fed910d41e443e495a6bfa9ab1c2b7b13e012
# good: [49a3b3cbdf1621678a39bd95a3e67c0f858539c7] x86: Use new cache mode type in mm/iomap_32.c
git bisect good 49a3b3cbdf1621678a39bd95a3e67c0f858539c7
# skip: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up pgtable_types.h
git bisect skip 87ad0b713b1034b6caf559976c35ce47f6d1d1e9
# skip: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache mode type in setting page attributes
git bisect skip c06814d8419a74528500f85faf5fc01f67f8e7e6
# skip: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache mode type in memtype related functions
git bisect skip e00c8cc93c1ac01ecd5049929a50fb47b62bb041
# skip: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to use cache mode translation tables
git bisect skip bd809af16e3ab1f8d55b3e2928c47c67e2a865d2
# skip: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit in pagetable dump for lower levels
git bisect skip f439c429c320981943f8b64b2a4049d946cb492b
# skip: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen pv-domains using PAT
git bisect skip 47591df505129c9774af6cca2debf283a6e56ed7
# skip: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache mode type in mm/ioremap.c
git bisect skip b14097bd911c2554b0b5271b3a6b2d84044d1843
# skip: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
git bisect skip 102e19e1955d85f31475416b1ee22980c6462cf8
# skip: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit when copying pte values between large and normal pages
git bisect skip f5b2831d654167d77da8afbef4d2584897b12d0c
# skip: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT only functions to mm/pat.c
git bisect skip 0dbcae884779fdf7e2239a97ac7488877f0693d9
# skip: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
git bisect skip 2a3746984c98b17b565e6a2c2bbaaaef757db1b4
# only skipped commits left to test
# possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
# possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT only functions to mm/pat.c
# possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen pv-domains using PAT
# possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to use cache mode translation tables
# possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit when copying pte values between large and normal pages
# possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit in pagetable dump for lower levels
# possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up pgtable_types.h
# possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache mode type in memtype related functions
# possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache mode type in mm/ioremap.c
# possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache mode type in setting page attributes
# possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
# possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
[-- Attachment #2: IMG_20150916_201816.jpg --]
[-- Type: image/jpeg, Size: 242949 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-18 21:43 Hibernate resume bug around 3,18-rc2 - Full PAT support Vassilis Virvilis @ 2015-11-19 5:39 ` Juergen Gross 2015-11-19 7:50 ` vasvir 2015-11-23 18:48 ` Luis R. Rodriguez 0 siblings, 2 replies; 22+ messages in thread From: Juergen Gross @ 2015-11-19 5:39 UTC (permalink / raw) To: vasvir, linux-kernel; +Cc: Toshi Kani, Luis R. Rodriguez On 18/11/15 22:43, Vassilis Virvilis wrote: > Hi, > > I have been hit by a hibernate/resume bug. Other people may have too: > The following links are consistent with my observations > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494 > https://bugs.archlinux.org/task/44807 > > Some observations: > 1) The first few rapid hibernation / resume cycles do not fail. > > 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + > thunderbird/icedove + Konsole) helps to reproduce and lock up during resume > > 3) Long hibernation times (overnight) helps to reproduce and lock up > during resume > > 4) For the bad commits (where the lockup during resume takes place) - > the image loading during resume is significantly faster. It is fast and > then it locks. > > How I hit the problem and what I have done: > > I am running debian unstable > > Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. > I upgraded diligently up to 4.2.6 - The problem persists Could you please try the most recent 4.3 kernel? There has been some work related to this topic after 4.2 (large page pat handling done by Toshi Kani and mtrr/pat handling by Luis Rodriguez). Another interesting information would be the exact hardware you are using. Maybe we can see some similarities between yours and the other two cases you referenced above. > I added no_console_suspend initcall_debug to the kernel command line - > see attached image of the lockup. > > I added the drm.debug=0xe but it didn't produce any interesting (ok I > know who I am to judge?) and the runs did not have it so I took it out > again. > > I reproduced with hibernating and resuming back to KDE and or back to > text console. > > I switched to the VGA console and the resume problem persists. > > I started kernel bisection from 3.16 to 3.19 following > https://wiki.debian.org/DebianKernel/GitBisect > > One month and 25 kernels later see below for the bisect log Wow! Thanks for doing this work! Juergen > > I hit some untestable kernel that weren't booting. They were hanging at > "Loading ramdisk..." before any actual kernel message. > > Looks like the first bad / untestable commit is from Juergen Gross / > Thomas Gleixner Merge branch 'x86-mm-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support] > > Full disclaimer: I may have fucked up the bisection. Finding bad commits > was semi easy - finding good commits needs a run time for 2-3 days. > > I would really appreciate some help and directions to nail this down. > > > Regards > > Vassilis Virvilis > > > > bill@localhost:~/Downloads/linux$ git bisect log > git bisect start > # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16 > git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6 > # bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 > git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735 > # good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch > 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping > git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a > # bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag > 'devicetree-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux > git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34 > # good: [53429290a054b30e4683297409fc4627b2592315] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc > git bisect good 53429290a054b30e4683297409fc4627b2592315 > # good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag > 'drivers-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51 > # bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch > 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs > git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3 > # good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag > 'defconfig-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect good 151cd97630f87451cab412e40750d0e5f7581c98 > # good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch > 'irq-core-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729 > # bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch > 'x86-microcode-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc > # good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch > 'x86-boot-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124 > # bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch > 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad a023748d53c10850650fe86b1c4a7d421d576451 > # good: [773fed910d41e443e495a6bfa9ab1c2b7b13e012] Merge branches > 'x86-platform-for-linus' and 'x86-uv-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > # good: [49a3b3cbdf1621678a39bd95a3e67c0f858539c7] x86: Use new cache > mode type in mm/iomap_32.c > git bisect good 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 > # skip: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up > pgtable_types.h > git bisect skip 87ad0b713b1034b6caf559976c35ce47f6d1d1e9 > # skip: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache > mode type in setting page attributes > git bisect skip c06814d8419a74528500f85faf5fc01f67f8e7e6 > # skip: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache > mode type in memtype related functions > git bisect skip e00c8cc93c1ac01ecd5049929a50fb47b62bb041 > # skip: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to > use cache mode translation tables > git bisect skip bd809af16e3ab1f8d55b3e2928c47c67e2a865d2 > # skip: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit > in pagetable dump for lower levels > git bisect skip f439c429c320981943f8b64b2a4049d946cb492b > # skip: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen > pv-domains using PAT > git bisect skip 47591df505129c9774af6cca2debf283a6e56ed7 > # skip: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache > mode type in mm/ioremap.c > git bisect skip b14097bd911c2554b0b5271b3a6b2d84044d1843 > # skip: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking > for setting of _PAGE_PAT_LARGE in pageattr.c > git bisect skip 102e19e1955d85f31475416b1ee22980c6462cf8 > # skip: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit > when copying pte values between large and normal pages > git bisect skip f5b2831d654167d77da8afbef4d2584897b12d0c > # skip: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT > only functions to mm/pat.c > git bisect skip 0dbcae884779fdf7e2239a97ac7488877f0693d9 > # skip: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache > mode type in track_pfn_remap() and track_pfn_insert() > git bisect skip 2a3746984c98b17b565e6a2c2bbaaaef757db1b4 > # only skipped commits left to test > # possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451] > Merge branch 'x86-mm-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > # possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9] > x86: mm: Move PAT only functions to mm/pat.c > # possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7] > xen: Support Xen pv-domains using PAT > # possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] > x86: Enable PAT to use cache mode translation tables > # possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c] > x86: Respect PAT bit when copying pte values between large and normal pages > # possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b] > x86: Support PAT bit in pagetable dump for lower levels > # possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] > x86: Clean up pgtable_types.h > # possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] > x86: Use new cache mode type in memtype related functions > # possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843] > x86: Use new cache mode type in mm/ioremap.c > # possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6] > x86: Use new cache mode type in setting page attributes > # possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8] > x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c > # possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] > x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-19 5:39 ` Juergen Gross @ 2015-11-19 7:50 ` vasvir 2015-11-19 9:10 ` Juergen Gross 2015-11-23 18:48 ` Luis R. Rodriguez 1 sibling, 1 reply; 22+ messages in thread From: vasvir @ 2015-11-19 7:50 UTC (permalink / raw) To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez Hi, Thanks for the quick answer > > Could you please try the most recent 4.3 kernel? There has been some > work related to this topic after 4.2 (large page pat handling done by > Toshi Kani and mtrr/pat handling by Luis Rodriguez). That means I will reset the bisection. Right? Is there any other info we can extract from there? So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume 4.3 for now. I will do it later tonight. It will take 2 days at least to report back > > Another interesting information would be the exact hardware you are > using. Maybe we can see some similarities between yours and the other > two cases you referenced above. > It is an i7 Motherboard: ASROCK H97 PRO4 RETAIL CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX It has 16GB of RAM, one SSD and one HDD I have NO external graphics card Do you want me to run something on this like lspci, lsusb I upgraded the BIOS of the motherboard to the latest. This is not the problem though because I upgraded after the problem occurred as a counter measure in case I was hit by a buggy BIOS and linux had changed its behavior to be stricter. I experimented with ACPI compilers/decompilers and I was tempted to fix my ACPI tables but I didn't. I saw the kernel command line option acpi_os=!Windows2013 but I didn't try it. Do you thing I should try it? > Wow! Thanks for doing this work! > I would like this to be fixed so I am willing to do the testing. Vassilis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-19 7:50 ` vasvir @ 2015-11-19 9:10 ` Juergen Gross 2015-11-19 20:35 ` Vassilis Virvilis 0 siblings, 1 reply; 22+ messages in thread From: Juergen Gross @ 2015-11-19 9:10 UTC (permalink / raw) To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 19/11/15 08:50, vasvir@iit.demokritos.gr wrote: > Hi, > > Thanks for the quick answer > >> >> Could you please try the most recent 4.3 kernel? There has been some >> work related to this topic after 4.2 (large page pat handling done by >> Toshi Kani and mtrr/pat handling by Luis Rodriguez). > > That means I will reset the bisection. Right? Is there any other info we > can extract from there? I don't see what else should be specific to that patch other than the information that the issue occurred due to that patch. All further diagnostic information should be obtainable with a newer kernel, too. > So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume > 4.3 for now. I think 4.3 is okay. > I will do it later tonight. It will take 2 days at least to report back Okay, thank you for your effort! > >> >> Another interesting information would be the exact hardware you are >> using. Maybe we can see some similarities between yours and the other >> two cases you referenced above. >> > > It is an i7 > Motherboard: ASROCK H97 PRO4 RETAIL > CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX > It has 16GB of RAM, one SSD and one HDD > I have NO external graphics card > > Do you want me to run something on this like lspci, lsusb Yes, please post the output of both. > I upgraded the BIOS of the motherboard to the latest. This is not the > problem though because I upgraded after the problem occurred as a counter > measure in case I was hit by a buggy BIOS and linux had changed its > behavior to be stricter. BIOS was my first guess, but in case the other two reports are really due to the same problem I doubt the BIOS is to blame (one Lenovo and one Sony laptop). > I experimented with ACPI compilers/decompilers and I was tempted to fix my > ACPI tables but I didn't. > > I saw the kernel command line option acpi_os=!Windows2013 but I didn't try > it. Do you thing I should try it? You could try "nopat" as command line option. > >> Wow! Thanks for doing this work! >> > > I would like this to be fixed so I am willing to do the testing. I appreciate this spirit. :-) Juergen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-19 9:10 ` Juergen Gross @ 2015-11-19 20:35 ` Vassilis Virvilis 2015-11-20 5:25 ` Vassilis Virvilis 0 siblings, 1 reply; 22+ messages in thread From: Vassilis Virvilis @ 2015-11-19 20:35 UTC (permalink / raw) To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez [-- Attachment #1: Type: text/plain, Size: 686 bytes --] On 11/19/2015 11:10 AM, Juergen Gross wrote: >> So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume >> 4.3 for now. > > I think 4.3 is okay. > >> I will do it later tonight. It will take 2 days at least to report back I compiled and I am running 4.3 right now. If it fails I will try with the nopat option. If it fails I will try 3.18-rc2+nopat to see if that fails. >> >> Do you want me to run something on this like lspci, lsusb > > Yes, please post the output of both. Here they are. See attachments > >> I would like this to be fixed so I am willing to do the testing. > > I appreciate this spirit. :-) > I appreciate the guidance. :-) Vassilis [-- Attachment #2: lsusb.txt --] [-- Type: text/plain, Size: 34976 bytes --] Bus 004 Device 002: ID 8087:8001 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8009 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 002: ID 8087:8001 Intel Corp. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize0 64 idVendor 0x8087 Intel Corp. idProduct 0x8001 bcdDevice 0.00 iManufacturer 0 iProduct 0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 12 Hub Descriptor: bLength 11 bDescriptorType 41 nNbrPorts 8 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood 0 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 0x00 PortPwrCtrlMask 0xff 0xff Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power Port 3: 0000.0100 power Port 4: 0000.0100 power Port 5: 0000.0100 power Port 6: 0000.0100 power Port 7: 0000.0100 power Port 8: 0000.0100 power Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 bNumConfigurations 1 Device Status: 0x0001 Self Powered Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 idVendor 0x1d6b Linux Foundation idProduct 0x0002 2.0 root hub bcdDevice 4.03 iManufacturer 3 Linux 4.3.0+ ehci_hcd iProduct 2 EHCI Host Controller iSerial 1 0000:00:1d.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0004 1x 4 bytes bInterval 12 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 2 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection bPwrOn2PwrGood 10 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x02 PortPwrCtrlMask 0xff Hub Port Status: Port 1: 0000.0507 highspeed power suspend enable connect Port 2: 0000.0100 power Device Status: 0x0001 Self Powered Bus 003 Device 002: ID 8087:8009 Intel Corp. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize0 64 idVendor 0x8087 Intel Corp. idProduct 0x8009 bcdDevice 0.00 iManufacturer 0 iProduct 0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0001 1x 1 bytes bInterval 12 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 6 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood 0 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 PortPwrCtrlMask 0xff Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power Port 3: 0000.0100 power Port 4: 0000.0100 power Port 5: 0000.0100 power Port 6: 0000.0100 power Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 bNumConfigurations 1 Device Status: 0x0001 Self Powered Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize0 64 idVendor 0x1d6b Linux Foundation idProduct 0x0002 2.0 root hub bcdDevice 4.03 iManufacturer 3 Linux 4.3.0+ ehci_hcd iProduct 2 EHCI Host Controller iSerial 1 0000:00:1a.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0004 1x 4 bytes bInterval 12 Hub Descriptor: bLength 9 bDescriptorType 41 nNbrPorts 2 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection bPwrOn2PwrGood 10 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x02 PortPwrCtrlMask 0xff Hub Port Status: Port 1: 0000.0507 highspeed power suspend enable connect Port 2: 0000.0100 power Device Status: 0x0001 Self Powered Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 3 bMaxPacketSize0 9 idVendor 0x1d6b Linux Foundation idProduct 0x0003 3.0 root hub bcdDevice 4.03 iManufacturer 3 Linux 4.3.0+ xhci-hcd iProduct 2 xHCI Host Controller iSerial 1 0000:00:14.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 31 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0004 1x 4 bytes bInterval 12 bMaxBurst 0 Hub Descriptor: bLength 12 bDescriptorType 42 nNbrPorts 6 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection bPwrOn2PwrGood 10 * 2 milli seconds bHubContrCurrent 0 milli Ampere bHubDecLat 0.0 micro seconds wHubDelay 0 nano seconds DeviceRemovable 0x00 Hub Port Status: Port 1: 0000.02a0 5Gbps power Rx.Detect Port 2: 0000.02a0 5Gbps power Rx.Detect Port 3: 0000.02a0 5Gbps power Rx.Detect Port 4: 0000.02a0 5Gbps power Rx.Detect Port 5: 0000.02a0 5Gbps power Rx.Detect Port 6: 0000.02a0 5Gbps power Rx.Detect Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 15 bNumDeviceCaps 1 SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x02 Latency Tolerance Messages (LTM) Supported wSpeedsSupported 0x0008 Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 3 Lowest fully-functional device speed is SuperSpeed (5Gbps) bU1DevExitLat 10 micro seconds bU2DevExitLat 512 micro seconds Device Status: 0x0001 Self Powered Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 8 idVendor 0x046d Logitech, Inc. idProduct 0x089d QuickCam E2500 series bcdDevice 1.00 iManufacturer 0 iProduct 0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 336 bNumInterfaces 3 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0000 1x 0 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 1 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0080 1x 128 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 2 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x00c0 1x 192 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 3 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0100 1x 256 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 4 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0180 1x 384 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 5 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 6 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0300 1x 768 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 7 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x03ff 1x 1023 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 0 bInterfaceClass 1 Audio bInterfaceSubClass 1 Control Device bInterfaceProtocol 0 iInterface 0 AudioControl Interface Descriptor: bLength 9 bDescriptorType 36 bDescriptorSubtype 1 (HEADER) bcdADC 1.00 wTotalLength 39 bInCollection 1 baInterfaceNr( 0) 2 AudioControl Interface Descriptor: bLength 12 bDescriptorType 36 bDescriptorSubtype 2 (INPUT_TERMINAL) bTerminalID 1 wTerminalType 0x0201 Microphone bAssocTerminal 0 bNrChannels 1 wChannelConfig 0x0000 iChannelNames 0 iTerminal 0 AudioControl Interface Descriptor: bLength 9 bDescriptorType 36 bDescriptorSubtype 6 (FEATURE_UNIT) bUnitID 2 bSourceID 1 bControlSize 2 bmaControls( 0) 0x43 bmaControls( 0) 0x00 Mute Control Volume Control Automatic Gain Control iFeature 0 AudioControl Interface Descriptor: bLength 9 bDescriptorType 36 bDescriptorSubtype 3 (OUTPUT_TERMINAL) bTerminalID 3 wTerminalType 0x0101 USB Streaming bAssocTerminal 0 bSourceID 2 iTerminal 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 0 bInterfaceClass 1 Audio bInterfaceSubClass 2 Streaming bInterfaceProtocol 0 iInterface 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 1 bNumEndpoints 1 bInterfaceClass 1 Audio bInterfaceSubClass 2 Streaming bInterfaceProtocol 0 iInterface 0 AudioStreaming Interface Descriptor: bLength 7 bDescriptorType 36 bDescriptorSubtype 1 (AS_GENERAL) bTerminalLink 3 bDelay 1 frames wFormatTag 1 PCM AudioStreaming Interface Descriptor: bLength 11 bDescriptorType 36 bDescriptorSubtype 2 (FORMAT_TYPE) bFormatType 1 (FORMAT_TYPE_I) bNrChannels 1 bSubframeSize 2 bBitResolution 16 bSamFreqType 1 Discrete tSamFreq[ 0] 8000 Endpoint Descriptor: bLength 9 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0010 1x 16 bytes bInterval 1 bRefresh 0 bSynchAddress 0 AudioControl Endpoint Descriptor: bLength 7 bDescriptorType 37 bDescriptorSubtype 1 (EP_GENERAL) bmAttributes 0x00 bLockDelayUnits 0 Undefined wLockDelay 0 Undefined Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 2 bNumEndpoints 1 bInterfaceClass 1 Audio bInterfaceSubClass 2 Streaming bInterfaceProtocol 0 iInterface 0 AudioStreaming Interface Descriptor: bLength 7 bDescriptorType 36 bDescriptorSubtype 1 (AS_GENERAL) bTerminalLink 3 bDelay 1 frames wFormatTag 1 PCM AudioStreaming Interface Descriptor: bLength 11 bDescriptorType 36 bDescriptorSubtype 2 (FORMAT_TYPE) bFormatType 1 (FORMAT_TYPE_I) bNrChannels 1 bSubframeSize 2 bBitResolution 16 bSamFreqType 1 Discrete tSamFreq[ 0] 16000 Endpoint Descriptor: bLength 9 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 1 Transfer Type Isochronous Synch Type None Usage Type Data wMaxPacketSize 0x0020 1x 32 bytes bInterval 1 bRefresh 0 bSynchAddress 0 AudioControl Endpoint Descriptor: bLength 7 bDescriptorType 37 bDescriptorSubtype 1 (EP_GENERAL) bmAttributes 0x00 bLockDelayUnits 0 Undefined wLockDelay 0 Undefined Device Status: 0x0000 (Bus Powered) Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x045e Microsoft Corp. idProduct 0x0745 Nano Transceiver v1.0 for Bluetooth bcdDevice 6.56 iManufacturer 1 Microsoft iProduct 2 Microsoft® 2.4GHz Transceiver v8.0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 84 bNumInterfaces 3 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 1 Keyboard iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.11 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 57 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 4 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 2 Mouse iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.11 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 295 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 1 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 0 No Subclass bInterfaceProtocol 0 None iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.11 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 319 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0020 1x 32 bytes bInterval 1 Device Status: 0x0000 (Bus Powered) Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize0 64 idVendor 0x1d6b Linux Foundation idProduct 0x0002 2.0 root hub bcdDevice 4.03 iManufacturer 3 Linux 4.3.0+ xhci-hcd iProduct 2 xHCI Host Controller iSerial 1 0000:00:14.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0004 1x 4 bytes bInterval 12 Hub Descriptor: bLength 11 bDescriptorType 41 nNbrPorts 14 wHubCharacteristic 0x000a No power switching (usb 1.0) Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood 10 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable 0x00 0x00 PortPwrCtrlMask 0xff 0xff Hub Port Status: Port 1: 0000.0100 power Port 2: 0000.0100 power Port 3: 0000.0103 power enable connect Port 4: 0000.0100 power Port 5: 0000.0100 power Port 6: 0000.0100 power Port 7: 0000.0100 power Port 8: 0000.0100 power Port 9: 0000.0100 power Port 10: 0000.0103 power enable connect Port 11: 0000.0100 power Port 12: 0000.0100 power Port 13: 0000.0100 power Port 14: 0000.0100 power Device Status: 0x0001 Self Powered [-- Attachment #3: lspci.txt --] [-- Type: text/plain, Size: 7362 bytes --] 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller 00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V 00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller 00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0) 00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0) 00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family H97 Controller 00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] 00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) Subsystem: ASRock Incorporation Device 0c00 Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: hsw_uncore 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller]) Subsystem: ASRock Incorporation Device 0412 Flags: bus master, fast devsel, latency 0, IRQ 31 Memory at f7800000 (64-bit, non-prefetchable) [size=4M] Memory at e0000000 (64-bit, prefetchable) [size=256M] I/O ports at f000 [size=64] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Capabilities: [a4] PCI Advanced Features Kernel driver in use: i915 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) Subsystem: ASRock Incorporation Device 0c0c Flags: bus master, fast devsel, latency 0, IRQ 32 Memory at f7c34000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Kernel driver in use: snd_hda_intel 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller (prog-if 30 [XHCI]) Subsystem: ASRock Incorporation Device 8cb1 Flags: bus master, medium devsel, latency 0, IRQ 27 Memory at f7c20000 (64-bit, non-prefetchable) [size=64K] Capabilities: [70] Power Management version 2 Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+ Kernel driver in use: xhci_hcd 00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1 Subsystem: ASRock Incorporation Device 8cba Flags: bus master, fast devsel, latency 0, IRQ 29 Memory at f7c3f000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Kernel driver in use: mei_me 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V Subsystem: ASRock Incorporation Device 15a1 Flags: bus master, fast devsel, latency 0, IRQ 26 Memory at f7c00000 (32-bit, non-prefetchable) [size=128K] Memory at f7c3c000 (32-bit, non-prefetchable) [size=4K] I/O ports at f080 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] PCI Advanced Features Kernel driver in use: e1000e 00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 (prog-if 20 [EHCI]) Subsystem: ASRock Incorporation Device 8cad Flags: bus master, medium devsel, latency 0, IRQ 16 Memory at f7c3b000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci 00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller Subsystem: ASRock Incorporation Device d892 Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at f7c30000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Virtual Channel Kernel driver in use: snd_hda_intel 00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 24 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Capabilities: [40] Express Root Port (Slot-), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASRock Incorporation Device 8c90 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport 00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 25 Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASRock Incorporation Device 244e Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport 00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 (prog-if 20 [EHCI]) Subsystem: ASRock Incorporation Device 8ca6 Flags: bus master, medium devsel, latency 0, IRQ 23 Memory at f7c3a000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci 00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family H97 Controller Subsystem: ASRock Incorporation Device 8cc6 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: lpc_ich 00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] (prog-if 01 [AHCI 1.0]) Subsystem: ASRock Incorporation Device 8c82 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 28 I/O ports at f0d0 [size=8] I/O ports at f0c0 [size=4] I/O ports at f0b0 [size=8] I/O ports at f0a0 [size=4] I/O ports at f060 [size=32] Memory at f7c39000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Kernel driver in use: ahci 00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller Subsystem: ASRock Incorporation Device 8ca2 Flags: medium devsel Memory at f7c38000 (64-bit, non-prefetchable) [size=256] I/O ports at f040 [size=32] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-19 20:35 ` Vassilis Virvilis @ 2015-11-20 5:25 ` Vassilis Virvilis 2015-11-20 8:47 ` Juergen Gross 0 siblings, 1 reply; 22+ messages in thread From: Vassilis Virvilis @ 2015-11-20 5:25 UTC (permalink / raw) To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: > > I compiled and I am running 4.3 right now. > It failed this morning. Last night I did 3 hibernate / resume cycles. In the last one I I also turned off the PSU (this seems to push it over the edge - but it may be random behavior) and it worked. This morning 7h later failed to resume - but it didn't hang on _lapic_resume. This time it rebooted - and I seem to recall this behavior for 4.2+ kernels. I forgot to mention it because my testing with 4.x kernels were one month before. So 4.3 kernel - reboots on resume after a long hibernation time. I am testing with 4.3 and nopat right now. Vassilis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-20 5:25 ` Vassilis Virvilis @ 2015-11-20 8:47 ` Juergen Gross 2015-11-20 10:04 ` vasvir 0 siblings, 1 reply; 22+ messages in thread From: Juergen Gross @ 2015-11-20 8:47 UTC (permalink / raw) To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 20/11/15 06:25, Vassilis Virvilis wrote: > On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: >> >> I compiled and I am running 4.3 right now. >> > > It failed this morning. Last night I did 3 hibernate / resume cycles. In > the last one I I also turned off the PSU (this seems to push it over the > edge - but it may be random behavior) and it worked. This morning 7h > later failed to resume - but it didn't hang on _lapic_resume. This time > it rebooted - and I seem to recall this behavior for 4.2+ kernels. I > forgot to mention it because my testing with 4.x kernels were one month > before. > > So 4.3 kernel - reboots on resume after a long hibernation time. > > I am testing with 4.3 and nopat right now. I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. Juergen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-20 8:47 ` Juergen Gross @ 2015-11-20 10:04 ` vasvir 2015-11-20 12:23 ` Juergen Gross 0 siblings, 1 reply; 22+ messages in thread From: vasvir @ 2015-11-20 10:04 UTC (permalink / raw) To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez > I've just found a potential issue: In case MTRR is disabled by the BIOS > the PAT register of the boot processor won't be restored after resume. > > Can you check whether pr_info("MTRR: Disabled\n") has been executed in > early boot? If yes, this might be a BIOS option. > I don't have access right now. I will test it later tonight (This is my home machine). Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr somewere else e.g. /proc /sys etc? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-20 10:04 ` vasvir @ 2015-11-20 12:23 ` Juergen Gross 2015-11-21 11:49 ` Vassilis Virvilis 0 siblings, 1 reply; 22+ messages in thread From: Juergen Gross @ 2015-11-20 12:23 UTC (permalink / raw) To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote: >> I've just found a potential issue: In case MTRR is disabled by the BIOS >> the PAT register of the boot processor won't be restored after resume. >> >> Can you check whether pr_info("MTRR: Disabled\n") has been executed in >> early boot? If yes, this might be a BIOS option. >> > > I don't have access right now. I will test it later tonight (This is my > home machine). > > Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr > somewere else e.g. /proc /sys etc? I think grepping for MTRR in dmesg should be enough. Juergen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-20 12:23 ` Juergen Gross @ 2015-11-21 11:49 ` Vassilis Virvilis 2015-11-23 7:32 ` Juergen Gross 2015-11-23 18:56 ` Luis R. Rodriguez 0 siblings, 2 replies; 22+ messages in thread From: Vassilis Virvilis @ 2015-11-21 11:49 UTC (permalink / raw) To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 11/20/2015 02:23 PM, Juergen Gross wrote: > On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote: >>> I've just found a potential issue: In case MTRR is disabled by the BIOS >>> the PAT register of the boot processor won't be restored after resume. >>> >>> Can you check whether pr_info("MTRR: Disabled\n") has been executed in >>> early boot? If yes, this might be a BIOS option. >>> >> >> I don't have access right now. I will test it later tonight (This is my >> home machine). >> >> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr >> somewere else e.g. /proc /sys etc? > > I think grepping for MTRR in dmesg should be enough. kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place. $dmesg | grep -i mtr for 4.3 kernel with notpat [ 0.189113] calling mtrr_if_init+0x0/0x5f @ 1 [ 0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [ 0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. [ 0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 [ 0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs [ 8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining [ 8.994154] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer. $dmesg | grep -i mtr for 4.3 kernel with default pat enabled [ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1 [ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. [ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 [ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR? Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently? Note: With PAT enabled the system boots up significantly faster. In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing... I will also try with nopat and I will run dmesg | grep -i mtr and post results Unless you have any other suggestions... Vassilis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-21 11:49 ` Vassilis Virvilis @ 2015-11-23 7:32 ` Juergen Gross 2015-11-23 14:11 ` vasvir 2015-11-23 18:56 ` Luis R. Rodriguez 1 sibling, 1 reply; 22+ messages in thread From: Juergen Gross @ 2015-11-23 7:32 UTC (permalink / raw) To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 21/11/15 12:49, Vassilis Virvilis wrote: > On 11/20/2015 02:23 PM, Juergen Gross wrote: >> On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote: >>>> I've just found a potential issue: In case MTRR is disabled by the BIOS >>>> the PAT register of the boot processor won't be restored after resume. >>>> >>>> Can you check whether pr_info("MTRR: Disabled\n") has been executed in >>>> early boot? If yes, this might be a BIOS option. >>>> >>> >>> I don't have access right now. I will test it later tonight (This is my >>> home machine). >>> >>> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr >>> somewere else e.g. /proc /sys etc? >> >> I think grepping for MTRR in dmesg should be enough. > > kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the > familiar (see previously attached image) "Calling lapic..." place. > > $dmesg | grep -i mtr for 4.3 kernel with notpat > [ 0.189113] calling mtrr_if_init+0x0/0x5f @ 1 > [ 0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [ 0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] > with a huge-page mapping due to MTRR override. > [ 0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 > [ 0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > [ 8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back > new: write-combining > [ 8.994154] Failed to add WC MTRR for > [00000000e0000000-00000000efffffff]; performance may suffer. > > $dmesg | grep -i mtr for 4.3 kernel with default pat enabled > [ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > [ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] > with a huge-page mapping due to MTRR override. > [ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > [ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > > > I also checked my BIOS. I found nothing about mtrr. My BIOS manual is > ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option > about MTRR? As the BIOS obviously isn't disabling MTRR I don't think we have to go that route any longer. > Question: If we assume your theory is correct about mtrr/pat, wouldn't > lockup/hang reboot every time the system goes to hibernate/resume? Can > this assumption explain why the first hibernation/resume cycles in rapid > succession after system boot are working and the long ones fail somewhat > more consistently? Hmm, I'm really not sure. It would depend on the usage of non-standard cache mode mappings. But as MTRR isn't disabled this theory won't apply to your problem. > Note: With PAT enabled the system boots up significantly faster. > > In the weekend I will return to 3.18-rc2 and I will try to verify my > bisection is correct. Double guessing your self is a terrible thing... Thanks. > I will also try with nopat and I will run dmesg | grep -i mtr and post > results > > Unless you have any other suggestions... I think we have to find out where the kernel is really hanging. Do you have any chance to trigger a NMI? Looking into suspend/resume code I found a strange inconsistency for the lapic handling: lapic_suspend() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); #endif ... } lapic_resume() { ... #if defined(CONFIG_X86_MCE_INTEL) if (maxlvt >= 5) apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); #endif ... } and comparing that to: clear_local_APIC() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) { v = apic_read(APIC_LVTTHMR); apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); } #endif #ifdef CONFIG_X86_MCE_INTEL if (maxlvt >= 6) { v = apic_read(APIC_LVTCMCI); if (!(v & APIC_LVT_MASKED)) apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); } #endif ... } I think it would be interesting to know your kernel config... Juergen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-23 7:32 ` Juergen Gross @ 2015-11-23 14:11 ` vasvir 2015-11-23 14:19 ` Juergen Gross 0 siblings, 1 reply; 22+ messages in thread From: vasvir @ 2015-11-23 14:11 UTC (permalink / raw) To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 11/20/2015 02:23 PM, Juergen Gross wrote: > > As the BIOS obviously isn't disabling MTRR I don't think we have > to go that route any longer. ok. >> >> In the weekend I will return to 3.18-rc2 and I will try to verify my >> bisection is correct. Double guessing your self is a terrible thing... > > Thanks. > >> I will also try with nopat and I will run dmesg | grep -i mtr and post >> results >> >> Unless you have any other suggestions... > I hit a very big problem here. I did $git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012 $make (with gcc 4.8 - as all my tests) and the resulting kernel in unbootable hunging in "Loading initial ramdisk..." second line of the kernel boot That means my bisection is not good because this release is marked as good. So now I am at loss. As I said I followed https://wiki.debian.org/DebianKernel/GitBisect I notice now that the article suggest a step $make oldconfig I did it once at the start of the bisection and then answering the default (Enter) in all config questions. > I think we have to find out where the kernel is really hanging. Do you > have any chance to trigger a NMI? I am googling about it. > > Looking into suspend/resume code I found a strange inconsistency for > the lapic handling: > > lapic_suspend() > { > ... > #ifdef CONFIG_X86_THERMAL_VECTOR > if (maxlvt >= 5) > apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); > #endif > ... > } > > lapic_resume() > { > ... > #if defined(CONFIG_X86_MCE_INTEL) > if (maxlvt >= 5) > apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); > #endif > ... > } > > and comparing that to: > > clear_local_APIC() > { > ... > #ifdef CONFIG_X86_THERMAL_VECTOR > if (maxlvt >= 5) { > v = apic_read(APIC_LVTTHMR); > apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); > } > #endif > #ifdef CONFIG_X86_MCE_INTEL > if (maxlvt >= 6) { > v = apic_read(APIC_LVTCMCI); > if (!(v & APIC_LVT_MASKED)) > apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); > } > #endif > ... > } > Ok I will send the .config when I get back home. I have all kernels I build in .deb archive. The problem is that the debian kernel build procedure does not hold somewhere in the deb file the git commit hash. Fow which kernel would you care to see the config? 4.3? Vassilis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-23 14:11 ` vasvir @ 2015-11-23 14:19 ` Juergen Gross 2015-11-24 22:46 ` Luis R. Rodriguez 0 siblings, 1 reply; 22+ messages in thread From: Juergen Gross @ 2015-11-23 14:19 UTC (permalink / raw) To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote: > Ok I will send the .config when I get back home. I have all kernels I > build in .deb archive. The problem is that the debian kernel build > procedure does not hold somewhere in the deb file the git commit hash. > > Fow which kernel would you care to see the config? 4.3? Doesn't really matter anymore. I've posted a patch already to fix it and got the reply, that the fix is okay, but no harm can come from the current implementation, as the two config options are always either both set or reset. Juergen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-23 14:19 ` Juergen Gross @ 2015-11-24 22:46 ` Luis R. Rodriguez 2015-11-25 5:01 ` Juergen Gross 0 siblings, 1 reply; 22+ messages in thread From: Luis R. Rodriguez @ 2015-11-24 22:46 UTC (permalink / raw) To: Juergen Gross; +Cc: vasvir, linux-kernel, Toshi Kani On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: > On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote: > > Ok I will send the .config when I get back home. I have all kernels I > > build in .deb archive. The problem is that the debian kernel build > > procedure does not hold somewhere in the deb file the git commit hash. > > > > Fow which kernel would you care to see the config? 4.3? > > Doesn't really matter anymore. I've posted a patch already to fix it and > got the reply, that the fix is okay, but no harm can come from the > current implementation, as the two config options are always either both > set or reset. Hrm, Vassilis seems to be able to reproduce this more effectively by heating up his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during boot." If we're suspending but the fan is still on I wonder if this could cause an issue with some settings the BIOS may have set prior to hibernation, and a mismatch upon resume. I can't find what APIC_LVT_MASKED does though, the best doc I found: https://www-ssl.intel.com/content/dam/www/public/us/en/documents/white-papers/cpu-monitoring-dts-peci-paper.pdf The inability to set the MTRR for the i915 card might be totally separate issue at this point, not sure. One could test that I suppose by just using vesa graphics card driver (disabling i915) to at least get a basic screen to see things and compile/test things. Luis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-24 22:46 ` Luis R. Rodriguez @ 2015-11-25 5:01 ` Juergen Gross 2015-11-25 19:24 ` Luis R. Rodriguez 0 siblings, 1 reply; 22+ messages in thread From: Juergen Gross @ 2015-11-25 5:01 UTC (permalink / raw) To: Luis R. Rodriguez; +Cc: vasvir, linux-kernel, Toshi Kani On 24/11/15 23:46, Luis R. Rodriguez wrote: > On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: >> On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote: >>> Ok I will send the .config when I get back home. I have all kernels I >>> build in .deb archive. The problem is that the debian kernel build >>> procedure does not hold somewhere in the deb file the git commit hash. >>> >>> Fow which kernel would you care to see the config? 4.3? >> >> Doesn't really matter anymore. I've posted a patch already to fix it and >> got the reply, that the fix is okay, but no harm can come from the >> current implementation, as the two config options are always either both >> set or reset. > > Hrm, Vassilis seems to be able to reproduce this more effectively by heating up > his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED > ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but > clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during > boot." If we're suspending but the fan is still on I wonder if this could cause > an issue with some settings the BIOS may have set prior to hibernation, and > a mismatch upon resume. > > I can't find what APIC_LVT_MASKED does though, the best doc I found: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf Local APIC (chapter 10.4). Juergen ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-25 5:01 ` Juergen Gross @ 2015-11-25 19:24 ` Luis R. Rodriguez 0 siblings, 0 replies; 22+ messages in thread From: Luis R. Rodriguez @ 2015-11-25 19:24 UTC (permalink / raw) To: Juergen Gross; +Cc: vasvir, linux-kernel, Toshi Kani On Wed, Nov 25, 2015 at 06:01:20AM +0100, Juergen Gross wrote: > On 24/11/15 23:46, Luis R. Rodriguez wrote: > > On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: > >> On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote: > >>> Ok I will send the .config when I get back home. I have all kernels I > >>> build in .deb archive. The problem is that the debian kernel build > >>> procedure does not hold somewhere in the deb file the git commit hash. > >>> > >>> Fow which kernel would you care to see the config? 4.3? > >> > >> Doesn't really matter anymore. I've posted a patch already to fix it and > >> got the reply, that the fix is okay, but no harm can come from the > >> current implementation, as the two config options are always either both > >> set or reset. > > > > Hrm, Vassilis seems to be able to reproduce this more effectively by heating up > > his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED > > ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but > > clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during > > boot." If we're suspending but the fan is still on I wonder if this could cause > > an issue with some settings the BIOS may have set prior to hibernation, and > > a mismatch upon resume. > > > > I can't find what APIC_LVT_MASKED does though, the best doc I found: > > http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf > > Local APIC (chapter 10.4). Thanks, yeah I only see the same thing you spotted and fixed [0] but also agree it does not play a role with this issue. Although completely not documented the APIC_LVT_MASKED just masks the thermal interrupts while we go down, and we just set the original value of the thermal register when we come up. The only other possible cautious reading about the thermal register seemed to be x86-32 bit specific. Let's see what the bisect ends up with. [0] https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=42baa2581c92f8d07e7260506c8d41caf14b0fc3 Luis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-21 11:49 ` Vassilis Virvilis 2015-11-23 7:32 ` Juergen Gross @ 2015-11-23 18:56 ` Luis R. Rodriguez 2015-11-23 23:01 ` Vassilis Virvilis 1 sibling, 1 reply; 22+ messages in thread From: Luis R. Rodriguez @ 2015-11-23 18:56 UTC (permalink / raw) To: Vassilis Virvilis; +Cc: Juergen Gross, linux-kernel, Toshi Kani On Sat, Nov 21, 2015 at 01:49:06PM +0200, Vassilis Virvilis wrote: > On 11/20/2015 02:23 PM, Juergen Gross wrote: > >On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote: > >>>I've just found a potential issue: In case MTRR is disabled by the BIOS > >>>the PAT register of the boot processor won't be restored after resume. > >>> > >>>Can you check whether pr_info("MTRR: Disabled\n") has been executed in > >>>early boot? If yes, this might be a BIOS option. > >>> > >> > >>I don't have access right now. I will test it later tonight (This is my > >>home machine). > >> > >>Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr > >>somewere else e.g. /proc /sys etc? > > > >I think grepping for MTRR in dmesg should be enough. > > kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place. > > $dmesg | grep -i mtr for 4.3 kernel with notpat > [ 0.189113] calling mtrr_if_init+0x0/0x5f @ 1 > [ 0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [ 0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. > [ 0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 > [ 0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs > [ 8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining > [ 8.994154] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer. Its not clear from the log who called this MTRR call for WC that failed, I hope we didn't attempt a WC wright on a WB region. Who owns 00000000e0000000-00000000efffffff ? What does your log show right before and after this? To find out try: dmesg | grep -5 -i mtrr Not being able to use WC is not fatal, its just a performance issue, but if we tried to override a region which we should not have to WC for which another area the BIOS might rely on to not be WC, that could be a big issue. > $dmesg | grep -i mtr for 4.3 kernel with default pat enabled > [ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > [ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. > [ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > [ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs The fact we don't see a conflict doesn't mean an issue or conflict didn't trigger. If PAT didn't see something the BIOS did that make the kernel assume it could do something that it was not able to. The MTRR init code should pick up on this stuff and let the kernel PAT code know if there could be a conflict, but if for some reason that was missed, that could be an issue. > I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR? > > Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently? > > Note: With PAT enabled the system boots up significantly faster. > > In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing... > > I will also try with nopat and I will run dmesg | grep -i mtr and post results > > Unless you have any other suggestions... Bisection on the merge commit would help. Luis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-23 18:56 ` Luis R. Rodriguez @ 2015-11-23 23:01 ` Vassilis Virvilis 2015-11-24 22:16 ` Luis R. Rodriguez 0 siblings, 1 reply; 22+ messages in thread From: Vassilis Virvilis @ 2015-11-23 23:01 UTC (permalink / raw) To: Luis R. Rodriguez; +Cc: Juergen Gross, linux-kernel, Toshi Kani [-- Attachment #1: Type: text/plain, Size: 9405 bytes --] On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: > Its not clear from the log who called this MTRR call for WC that failed, I > hope we didn't attempt a WC wright on a WB region. Who owns > 00000000e0000000-00000000efffffff ? How can I answer that? Is there any utility to run? peek inside /proc? Here is an idea: $dmesg | grep -i -5 e0000000 [ 0.220941] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window] [ 0.220944] pci_bus 0000:00: root bus resource [mem 0xdf200000-0xfeafffff window] [ 0.220950] pci 0000:00:00.0: [8086:0c00] type 00 class 0x060000 [ 0.221012] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000 [ 0.221021] pci 0000:00:02.0: reg 0x10: [mem 0xf7800000-0xf7bfffff 64bit] [ 0.221025] pci 0000:00:02.0: reg 0x18: [mem 0xe0000000-0xefffffff 64bit pref] [ 0.221028] pci 0000:00:02.0: reg 0x20: [io 0xf000-0xf03f] [ 0.221081] pci 0000:00:03.0: [8086:0c0c] type 00 class 0x040300 [ 0.221089] pci 0000:00:03.0: reg 0x10: [mem 0xf7c34000-0xf7c37fff 64bit] [ 0.221163] pci 0000:00:14.0: [8086:8cb1] type 00 class 0x0c0330 [ 0.221184] pci 0000:00:14.0: reg 0x10: [mem 0xf7c20000-0xf7c2ffff 64bit] -- [ 0.453765] calling ioapic_init_ops+0x0/0xf @ 1 [ 0.453767] initcall ioapic_init_ops+0x0/0xf returned 0 after 0 usecs [ 0.453770] calling add_pcspkr+0x0/0x3b @ 1 [ 0.453781] initcall add_pcspkr+0x0/0x3b returned 0 after 8 usecs [ 0.453783] calling sysfb_init+0x0/0x96 @ 1 [ 0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe0000000, 0x6bb000 bytes, mapped to 0xffffc90002000000 [ 0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720 [ 0.557233] Console: switching to colour frame buffer device 210x65 [ 0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered! [ 0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs [ 0.661266] calling audit_classes_init+0x0/0xaa @ 1 -- [ 9.744397] input: gspca_zc3xx as /devices/pci0000:00/0000:00:14.0/usb3/3-3/input/input18 [ 9.744481] usbcore: registered new interface driver gspca_zc3xx [ 9.744484] initcall sd_driver_init+0x0/0x1000 [gspca_zc3xx] returned 0 after 319 usecs [ 9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 [ 9.745542] [drm] Memory usable by graphics device = 2048M [ 9.745544] checking generic (e0000000 6bb000) vs hw (e0000000 10000000) [ 9.745544] fb: switching to inteldrmfb from simple [ 9.745831] calling alsa_seq_device_init+0x0/0x1000 [snd_seq_device] @ 384 [ 9.745842] initcall alsa_seq_device_init+0x0/0x1000 [snd_seq_device] returned 0 after 9 usecs [ 9.746179] calling hmac_module_init+0x0/0x1000 [hmac] @ 471 [ 9.746180] initcall hmac_module_init+0x0/0x1000 [hmac] returned 0 after 0 usecs -- [ 9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384 [ 9.751163] usbcore: registered new interface driver snd-usb-audio [ 9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs [ 9.943166] Console: switching to colour dummy device 80x25 [ 9.943240] [drm] Replacing VGA console driver [ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining [ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer. [ 9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS [ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 9.949728] [drm] Driver supports precise vblank timestamp query. [ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) $lspci | grep 00:02.0 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) Looks like it is the graphics card or the graphics driver. I don't know if this is relevant $ cat /proc/mtrr reg00: base=0x000000000 ( 0MB), size=16384MB, count=1: write-back reg01: base=0x400000000 (16384MB), size= 512MB, count=1: write-back reg02: base=0x0e0000000 ( 3584MB), size= 512MB, count=1: uncachable reg03: base=0x0d0000000 ( 3328MB), size= 256MB, count=1: uncachable reg04: base=0x0cf000000 ( 3312MB), size= 16MB, count=1: uncachable reg05: base=0x41f000000 (16880MB), size= 16MB, count=1: uncachable reg06: base=0x41ee00000 (16878MB), size= 2MB, count=1: uncachable > > What does your log show right before and after this? To find out try: > > dmesg | grep -5 -i mtrr > See full dmesg attached $dmesg | grep -5 -i mtrr [ 0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs [ 0.189336] calling pt_init+0x0/0x2a4 @ 1 [ 0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs [ 0.189352] calling bts_init+0x0/0xa4 @ 1 [ 0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs [ 0.189357] calling mtrr_if_init+0x0/0x5f @ 1 [ 0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [ 0.189362] calling ffh_cstate_init+0x0/0x26 @ 1 [ 0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs [ 0.189366] calling activate_jump_labels+0x0/0x2d @ 1 [ 0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs [ 0.189370] calling kcmp_cookies_init+0x0/0x31 @ 1 -- [ 0.189424] calling dmi_id_init+0x0/0x300 @ 1 [ 0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs [ 0.189450] calling pci_arch_init+0x0/0x63 @ 1 [ 0.189458] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000) [ 0.189462] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in E820 [ 0.189467] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. [ 0.189514] PCI: Using configuration type 1 for base access [ 0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs [ 0.189528] calling init_vdso+0x0/0x44 @ 1 [ 0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs [ 0.189538] calling sysenter_setup+0x0/0x52 @ 1 -- [ 0.189542] calling topology_init+0x0/0x83 @ 1 [ 0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs [ 0.189798] calling fixup_ht_bug+0x0/0xed @ 1 [ 0.189799] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on [ 0.189802] initcall fixup_ht_bug+0x0/0xed returned 0 after 0 usecs [ 0.189805] calling mtrr_init_finialize+0x0/0x3a @ 1 [ 0.189807] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs [ 0.189809] calling uid_cache_init+0x0/0x90 @ 1 [ 0.189810] initcall uid_cache_init+0x0/0x90 returned 0 after 0 usecs [ 0.189812] calling param_sysfs_init+0x0/0x1d9 @ 1 [ 0.190106] initcall param_sysfs_init+0x0/0x1d9 returned 0 after 0 usecs [ 0.190108] calling pm_sysrq_init+0x0/0x14 @ 1 -- [ 9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384 [ 9.751163] usbcore: registered new interface driver snd-usb-audio [ 9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs [ 9.943166] Console: switching to colour dummy device 80x25 [ 9.943240] [drm] Replacing VGA console driver [ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining [ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer. [ 9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS [ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 9.949728] [drm] Driver supports precise vblank timestamp query. [ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) > Not being able to use WC is not fatal, its just a performance issue, but if we tried > to override a region which we should not have to WC for which another area the BIOS > might rely on to not be WC, that could be a big issue. > >> $dmesg | grep -i mtr for 4.3 kernel with default pat enabled >> [ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1 >> [ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs >> [ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. >> [ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 >> [ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs > > The fact we don't see a conflict doesn't mean an issue or conflict didn't > trigger. If PAT didn't see something the BIOS did that make the kernel assume > it could do something that it was not able to. The MTRR init code should pick > up on this stuff and let the kernel PAT code know if there could be a conflict, > but if for some reason that was missed, that could be an issue. > Ok I am not sure if there is something I should do here. I am attaching the dmesg for that boot just in case. $cat /proc/mtrr gives the same results >> Unless you have any other suggestions... > > Bisection on the merge commit would help. > Will do. Thanks for the guidance, and the through explanations. Vassilis [-- Attachment #2: dmesg-4.3-nopat.txt.gz --] [-- Type: application/x-gzip, Size: 27174 bytes --] [-- Attachment #3: dmesg-4.3-pat.txt.gz --] [-- Type: application/x-gzip, Size: 27067 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-23 23:01 ` Vassilis Virvilis @ 2015-11-24 22:16 ` Luis R. Rodriguez 0 siblings, 0 replies; 22+ messages in thread From: Luis R. Rodriguez @ 2015-11-24 22:16 UTC (permalink / raw) To: Vassilis Virvilis; +Cc: Juergen Gross, linux-kernel, Toshi Kani On Tue, Nov 24, 2015 at 01:01:31AM +0200, Vassilis Virvilis wrote: > On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: > >Its not clear from the log who called this MTRR call for WC that failed, I > >hope we didn't attempt a WC wright on a WB region. Who owns > >00000000e0000000-00000000efffffff ? > > How can I answer that? Is there any utility to run? peek inside /proc? > > [ 0.221012] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000 > [ 0.221021] pci 0000:00:02.0: reg 0x10: [mem 0xf7800000-0xf7bfffff 64bit] > [ 0.221025] pci 0000:00:02.0: reg 0x18: [mem 0xe0000000-0xefffffff 64bit pref] > [ 0.221028] pci 0000:00:02.0: reg 0x20: [io 0xf000-0xf03f] ... > [ 0.453783] calling sysfb_init+0x0/0x96 @ 1 > [ 0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe0000000, 0x6bb000 bytes, mapped to 0xffffc90002000000 > [ 0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720 > [ 0.557233] Console: switching to colour frame buffer device 210x65 > [ 0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered! > [ 0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs ... > [ 9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 > [ 9.745542] [drm] Memory usable by graphics device = 2048M > [ 9.745544] checking generic (e0000000 6bb000) vs hw (e0000000 10000000) > [ 9.745544] fb: switching to inteldrmfb from simple ... > [ 9.943166] Console: switching to colour dummy device 80x25 > [ 9.943240] [drm] Replacing VGA console driver > [ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining > [ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer. > [ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [ 9.949728] [drm] Driver supports precise vblank timestamp query. > [ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem ... > $lspci | grep 00:02.0 > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) > > Looks like it is the graphics card or the graphics driver. Good job yes. > I don't know if this is relevant > $ cat /proc/mtrr > reg00: base=0x000000000 ( 0MB), size=16384MB, count=1: write-back > reg01: base=0x400000000 (16384MB), size= 512MB, count=1: write-back > reg02: base=0x0e0000000 ( 3584MB), size= 512MB, count=1: uncachable Right so it tried to set this to WC but failed, and when using PAT MTRR is not used instead PAT is used and your log showed no error. > reg03: base=0x0d0000000 ( 3328MB), size= 256MB, count=1: uncachable > reg04: base=0x0cf000000 ( 3312MB), size= 16MB, count=1: uncachable > reg05: base=0x41f000000 (16880MB), size= 16MB, count=1: uncachable > reg06: base=0x41ee00000 (16878MB), size= 2MB, count=1: uncachable > > > > >What does your log show right before and after this? To find out try: > > > >dmesg | grep -5 -i mtrr > > > > See full dmesg attached > > $dmesg | grep -5 -i mtrr > [ 0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs > [ 0.189336] calling pt_init+0x0/0x2a4 @ 1 > [ 0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs > [ 0.189352] calling bts_init+0x0/0xa4 @ 1 > [ 0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs > [ 0.189357] calling mtrr_if_init+0x0/0x5f @ 1 > [ 0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [ 0.189362] calling ffh_cstate_init+0x0/0x26 @ 1 > [ 0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs > [ 0.189366] calling activate_jump_labels+0x0/0x2d @ 1 > [ 0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs > [ 0.189370] calling kcmp_cookies_init+0x0/0x31 @ 1 > -- > [ 0.189424] calling dmi_id_init+0x0/0x300 @ 1 > [ 0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs > [ 0.189450] calling pci_arch_init+0x0/0x63 @ 1 > [ 0.189458] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000) > [ 0.189462] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in E820 > [ 0.189467] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. > [ 0.189514] PCI: Using configuration type 1 for base access > [ 0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs > [ 0.189528] calling init_vdso+0x0/0x44 @ 1 > [ 0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs > [ 0.189538] calling sysenter_setup+0x0/0x52 @ 1 > -- > [ 0.189542] calling topology_init+0x0/0x83 @ 1 > [ 0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs > [ 0.189798] calling fixup_ht_bug+0x0/0xed @ 1 > [ 0.189799] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on > [ 0.189802] initcall fixup_ht_bug+0x0/0xed returned 0 after 0 usecs > [ 0.189805] calling mtrr_init_finialize+0x0/0x3a @ 1 > [ 0.189807] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs > [ 0.189809] calling uid_cache_init+0x0/0x90 @ 1 > [ 0.189810] initcall uid_cache_init+0x0/0x90 returned 0 after 0 usecs > [ 0.189812] calling param_sysfs_init+0x0/0x1d9 @ 1 > [ 0.190106] initcall param_sysfs_init+0x0/0x1d9 returned 0 after 0 usecs > [ 0.190108] calling pm_sysrq_init+0x0/0x14 @ 1 > -- > [ 9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384 > [ 9.751163] usbcore: registered new interface driver snd-usb-audio > [ 9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs > [ 9.943166] Console: switching to colour dummy device 80x25 > [ 9.943240] [drm] Replacing VGA console driver > [ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining > [ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer. > [ 9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS > [ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [ 9.949728] [drm] Driver supports precise vblank timestamp query. > [ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem > [ 9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) Thanks. I don't see anything obvious that should have caused MTRR for the graphics driver to have failed here... > >Not being able to use WC is not fatal, its just a performance issue, but if we tried > >to override a region which we should not have to WC for which another area the BIOS > >might rely on to not be WC, that could be a big issue. > > > > >>$dmesg | grep -i mtr for 4.3 kernel with default pat enabled > >>[ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > >>[ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > >>[ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override. > >>[ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > >>[ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs > > > >The fact we don't see a conflict doesn't mean an issue or conflict didn't > >trigger. If PAT didn't see something the BIOS did that make the kernel assume > >it could do something that it was not able to. The MTRR init code should pick > >up on this stuff and let the kernel PAT code know if there could be a conflict, > >but if for some reason that was missed, that could be an issue. > > > > Ok I am not sure if there is something I should do here. I am attaching the dmesg for that boot just in case. > $cat /proc/mtrr gives the same results > > >>Unless you have any other suggestions... > > > >Bisection on the merge commit would help. > > > > Will do. > > Thanks for the guidance, and the through explanations. This helps but it doesn't give us further insight as to why the error really occurred in the first place for the mttr add call. Let's debug further on the bisect and see where that takes us. Luis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-19 5:39 ` Juergen Gross 2015-11-19 7:50 ` vasvir @ 2015-11-23 18:48 ` Luis R. Rodriguez 2015-11-24 9:36 ` vasvir 1 sibling, 1 reply; 22+ messages in thread From: Luis R. Rodriguez @ 2015-11-23 18:48 UTC (permalink / raw) To: Juergen Gross; +Cc: Vassilis Virvilis, linux-kernel, Toshi Kani, mcgrof, mcgrof On Thu, Nov 19, 2015 at 06:39:28AM +0100, Juergen Gross wrote: > On 18/11/15 22:43, Vassilis Virvilis wrote: > > Hi, > > > > I have been hit by a hibernate/resume bug. Other people may have too: > > The following links are consistent with my observations > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494 > > https://bugs.archlinux.org/task/44807 > > > > Some observations: > > 1) The first few rapid hibernation / resume cycles do not fail. > > > > 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + > > thunderbird/icedove + Konsole) helps to reproduce and lock up during resume Let's try to speed up reproducing this. I have a hunch perhaps this might be related to some BIOS controlled MTRRs and a mismatch which then enables the kernel to think that a type of MTRR write might be OK, but in fact its not. Due to the work load description of this perhaps this could be related to fan control and BIOS control on them and against some other device MTRR. More on this suspicion on another thread where you provide more logs. On a kernel that you know fails can you try replacing this work load by making you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building for 2, 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if making the CPU fan trigger would accelerate the issue. If 'make -j' is too nuts to the point you can't even CTRL C it, try 'make -j 16' . Note that if this is true then that means a hot CPU could still trigger CPU fan controls on on a fresh boot if the previous boot was CPU intensive. If this doesn't do it lets try forcing an MTRR capable driver, say graphics is the obvious target, try perhaps some 3D stuff or a screen saver prior to hibernation. Note that even if you boot nomtrr the BIOS may still use MTRRs, and PAT use on Linux could assume MTRR is not being used on drivers but the BIOS may still do something behind the scenes. This is actually one reason why we can't exactly remove MTRR support from Linux, since the BIOS may still do some wacky stuff with MTRRs, one example of such I was given was CPU can control might use WC MTRRs, so the kernel must be aware of this, even if no MTRRs are ever used on the Linux kernel at all -- this is the case now as of v4.3 and onwards. If that doesn't help speed it up , maybe try both screen saver + some 3D stuff + cpu instensive stuff. To help you speed up testing you can try reducing your build time by reducing the amount of crap you have to build: make localmodconfig That should only build things your kernel has loaded as modules or is already enabled (=y). > > 3) Long hibernation times (overnight) helps to reproduce and lock up > > during resume > > > > 4) For the bad commits (where the lockup during resume takes place) - > > the image loading during resume is significantly faster. It is fast and > > then it locks. > > > > How I hit the problem and what I have done: > > > > I am running debian unstable > > > > Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. > > I upgraded diligently up to 4.2.6 - The problem persists > > > > I started kernel bisection from 3.16 to 3.19 following > > https://wiki.debian.org/DebianKernel/GitBisect > > > > One month and 25 kernels later see below for the bisect log > > Wow! Thanks for doing this work! > Vassilis, indeed, the amount of work you have put into this is extremely appreciated! > Juergen > > > > > I hit some untestable kernel that weren't booting. They were hanging at > > "Loading ramdisk..." before any actual kernel message. > > > > Looks like the first bad / untestable commit is from Juergen Gross / > > Thomas Gleixner Merge branch 'x86-mm-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support] > > That is commit a023748d53c10850650fe86b1c4a7d421d576451 ("Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") Git is smart enough to tell you you've hit a merge commit and that all the possible commits on that merge could be the issue. This is why you bisect log shows a slew of commits. The next step is to bisect through the merge and then bisect through that, this will then let us identify the exact commit that may have caused the issue. There are a few ways to do this, my preferred way is to "unfold" a merge commit manually. To help keep thing separately (without affecting other tests you might have on your other git tree and to avoid having to force you to loose fresh object as you continue to build test on the other tree), I'd do something like this: mkdir ~/tmp git clone ~/linux/.git linux-dev-test cd linux-dev-test Notice how if you do git log and search for a023748d53c10850650fe86b1c4a7d421d576451 you'll see that the commit listed before this is 773fed910d41e443e495a6bfa9ab1c2b7b13e012 ("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") To be clear the list of commits you typically would see is just: a023748d53c10850650fe86b1c4a7d421d576451 - Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 773fed910d41e443e495a6bfa9ab1c2b7b13e012 - Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip We want to go down into the commits in the merge commit a023748d53c and then zero out exactly which commit caused the issue. To do that on your linux-dev-test directory you can do this: git checkout -b test-merge-commit a023748d53c10850650fe86b1c4a7d421d576451 That will create branch for testing based on the merge commit. Now do this: git rebase -i 773fed910d41e443e495a6bfa9ab1c2b7b13e012 Then don't pick any commit, just save and exit the editor, and then git will actually "unfold" the merge commit for you -- it magically will apply each commit in that merge commit linearly into your git history. For instance the rebase should show 22 commits as follows, just leave the defaults set as in bewlow and just hit (ESCT + :wq if in vi): pick 96e70f832856 x86/mm: Avoid overlap the fixmap area on i386 pick 63e7b6d90c1e x86: mm: Re-use the early_ioremap fixed area pick bdee237c0343 x86: mm: Use 2GB memory block size on large-memory x86-64 systems pick 281d4078bec3 x86: Make page cache mode a real type pick c27ce0af896b x86: Use new cache mode type in include/asm/fb.h pick 2d85ebf8e12e x86: Use new cache mode type in drivers/video/fbdev/gbefb.c pick 5006e45a6bc2 x86: Use new cache mode type in drivers/video/fbdev/vermilion pick 1c64216be164 x86: Use new cache mode type in arch/x86/pci pick 2df58b6d3530 x86: Use new cache mode type in arch/x86/mm/init_64.c pick d85f33342a0f x86: Use new cache mode type in asm/pgtable.h pick 49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c pick 2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() pick 102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c pick c06814d8419a x86: Use new cache mode type in setting page attributes pick b14097bd911c x86: Use new cache mode type in mm/ioremap.c pick e00c8cc93c1a x86: Use new cache mode type in memtype related functions pick 87ad0b713b10 x86: Clean up pgtable_types.h pick f439c429c320 x86: Support PAT bit in pagetable dump for lower levels pick f5b2831d6541 x86: Respect PAT bit when copying pte values between large and normal pages pick bd809af16e3a x86: Enable PAT to use cache mode translation tables pick 47591df50512 xen: Support Xen pv-domains using PAT pick 0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c You should see: Successfully rebased and updated refs/heads/test-merge-commit. Now if you do git log you will see the above commits in linear atomic history. You can now bisect this merge commit atomically, so do: git bisect 099487de0934e3d5e326666914a426af89a0868b 773fed910d41e443e495a6bfa9ab1c2b7b13e012 Note that this assumes that the commit prior to the merge commit is fine. Is this true, can you confirm? (git checkout -b test-prior-merge-gtest 773fed910d4, build and see if it doesn't break there) If we know for sure 773fed910d4 did not break anything then the above bisect should let us zero in on the exact atomic commit ID that caused the issue. > > Full disclaimer: I may have fucked up the bisection. Finding bad commits > > was semi easy - finding good commits needs a run time for 2-3 days. Reducing the amount of time it takes to reproduce a bug is art work but perhaps we can reduce that time. > > > > I would really appreciate some help and directions to nail this down. > > The amount of time and patience on your side is appreciated as well. > > > > Regards > > > > Vassilis Virvilis > > > > > > > > bill@localhost:~/Downloads/linux$ git bisect log > > git bisect start > > # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16 > > git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6 > > # bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 > > git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735 > > # good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch > > 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping > > git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a > > # bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag > > 'devicetree-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux > > git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34 > > # good: [53429290a054b30e4683297409fc4627b2592315] Merge > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc > > git bisect good 53429290a054b30e4683297409fc4627b2592315 > > # good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag > > 'drivers-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > > git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51 > > # bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch > > 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs > > git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3 > > # good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag > > 'defconfig-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > > git bisect good 151cd97630f87451cab412e40750d0e5f7581c98 > > # good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch > > 'irq-core-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729 > > # bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch > > 'x86-microcode-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc > > # good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch > > 'x86-boot-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124 > > # bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch > > 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect bad a023748d53c10850650fe86b1c4a7d421d576451 > > # good: [773fed910d41e443e495a6bfa9ab1c2b7b13e012] Merge branches > > 'x86-platform-for-linus' and 'x86-uv-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > > # good: [49a3b3cbdf1621678a39bd95a3e67c0f858539c7] x86: Use new cache > > mode type in mm/iomap_32.c > > git bisect good 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 > > # skip: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up > > pgtable_types.h > > git bisect skip 87ad0b713b1034b6caf559976c35ce47f6d1d1e9 > > # skip: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache > > mode type in setting page attributes > > git bisect skip c06814d8419a74528500f85faf5fc01f67f8e7e6 > > # skip: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache > > mode type in memtype related functions > > git bisect skip e00c8cc93c1ac01ecd5049929a50fb47b62bb041 > > # skip: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to > > use cache mode translation tables > > git bisect skip bd809af16e3ab1f8d55b3e2928c47c67e2a865d2 > > # skip: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit > > in pagetable dump for lower levels > > git bisect skip f439c429c320981943f8b64b2a4049d946cb492b > > # skip: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen > > pv-domains using PAT > > git bisect skip 47591df505129c9774af6cca2debf283a6e56ed7 > > # skip: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache > > mode type in mm/ioremap.c > > git bisect skip b14097bd911c2554b0b5271b3a6b2d84044d1843 > > # skip: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking > > for setting of _PAGE_PAT_LARGE in pageattr.c > > git bisect skip 102e19e1955d85f31475416b1ee22980c6462cf8 > > # skip: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit > > when copying pte values between large and normal pages > > git bisect skip f5b2831d654167d77da8afbef4d2584897b12d0c > > # skip: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT > > only functions to mm/pat.c > > git bisect skip 0dbcae884779fdf7e2239a97ac7488877f0693d9 > > # skip: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache > > mode type in track_pfn_remap() and track_pfn_insert() > > git bisect skip 2a3746984c98b17b565e6a2c2bbaaaef757db1b4 > > # only skipped commits left to test > > # possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451] > > Merge branch 'x86-mm-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > # possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9] > > x86: mm: Move PAT only functions to mm/pat.c > > # possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7] > > xen: Support Xen pv-domains using PAT > > # possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] > > x86: Enable PAT to use cache mode translation tables > > # possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c] > > x86: Respect PAT bit when copying pte values between large and normal pages > > # possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b] > > x86: Support PAT bit in pagetable dump for lower levels > > # possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] > > x86: Clean up pgtable_types.h > > # possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] > > x86: Use new cache mode type in memtype related functions > > # possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843] > > x86: Use new cache mode type in mm/ioremap.c > > # possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6] > > x86: Use new cache mode type in setting page attributes > > # possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8] > > x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c > > # possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] > > x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() > -- Luis Rodriguez, SUSE LINUX GmbH Maxfeldstrasse 5; D-90409 Nuernberg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-23 18:48 ` Luis R. Rodriguez @ 2015-11-24 9:36 ` vasvir 2015-11-24 22:03 ` Luis R. Rodriguez 0 siblings, 1 reply; 22+ messages in thread From: vasvir @ 2015-11-24 9:36 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Juergen Gross, Vassilis Virvilis, linux-kernel, Toshi Kani, mcgrof, mcgrof > Let's try to speed up reproducing this. > > I have a hunch perhaps this might be related to some BIOS controlled > MTRRs and a mismatch which then enables the kernel to think that a type > of MTRR write might be OK, but in fact its not. Due to the work load > description of this perhaps this could be related to fan control and BIOS > control on them and against some other device MTRR. More on this suspicion > on another thread where you provide more logs. > > On a kernel that you know fails can you try replacing this work load by > making > you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building > for 2, > 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if > making the CPU fan trigger would accelerate the issue. If 'make -j' is > too nuts > to the point you can't even CTRL C it, try 'make -j 16' . Note that if > this is > true then that means a hot CPU could still trigger CPU fan controls on on > a > fresh boot if the previous boot was CPU intensive. OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to reproduce it in the second hibernate/resume cycle. Here is what I did in my own words so you can spot inconsistencies. I started a kernel compile with make -j 32. My computer was very responsive which is an impressive feat by the way. In a second tab in my Konsole (I am running KDE) I run $watch sensors. I watched the temperature of the cores to go from 38 to ~70 and the cpu fan from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the compilation and hibernated from the KDE. I always hibernate from the KDE start menu. Previously I had made some tests where I was hibernating from the VT console (although sddm may was running in VT7) and I have managed to reproduce it - so (in my mind) it was not graphics mode specific. From that point I am always hibernating from KDE. The first time it worked. For the second time I thought - why to hit Ctrl+C let's try to hibernate with the compilation running - and it failed. Now I don't know if it failed because it was the second cycle or because the load of the compilation was there or because of the temperature controlled fan register you mentioned. Then I repeated the test with a known good kernel 3.18 (which should be 773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs - I have a problem there - see below) and it survived the same test (hibernate two times with temperature being ~70). > If this doesn't do it lets try forcing an MTRR capable driver, say > graphics is > the obvious target, try perhaps some 3D stuff or a screen saver prior to > hibernation. Note that even if you boot nomtrr the BIOS may still use > MTRRs, > and PAT use on Linux could assume MTRR is not being used on drivers but > the > BIOS may still do something behind the scenes. This is actually one reason > why > we can't exactly remove MTRR support from Linux, since the BIOS may still > do > some wacky stuff with MTRRs, one example of such I was given was CPU can > control might use WC MTRRs, so the kernel must be aware of this, even if > no > MTRRs are ever used on the Linux kernel at all -- this is the case now as > of > v4.3 and onwards. > > If that doesn't help speed it up , maybe try both screen saver + some 3D > stuff + cpu instensive stuff. I have 3D effects enabled in my KDE. Since your tip succeed to reproduce the problem early I didn't bother but If I should test 3D which program / benchmark should I run? glxgears? > > To help you speed up testing you can try reducing your build time by > reducing > the amount of crap you have to build: > > make localmodconfig > > That should only build things your kernel has loaded as modules or is > already > enabled (=y). > Thanks for the tip. I don't want to change that right now. I don't mind waiting a little bit because I a get a deb with the kernel and can retest a known configuration. The other tip you gave if it actually works as it looks like working would give a great boost to the debugging cycle to actually make me the bottleneck. > > That is commit a023748d53c10850650fe86b1c4a7d421d576451 > ("Merge branch 'x86-mm-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") > > Git is smart enough to tell you you've hit a merge commit and that all the > possible commits on that merge could be the issue. This is why you bisect > log shows a slew of commits. The next step is to bisect through the merge > and then bisect through that, this will then let us identify the exact > commit > that may have caused the issue. > > There are a few ways to do this, my preferred way is to "unfold" a merge > commit manually. > > To help keep thing separately (without affecting other tests you might > have on your other git tree and to avoid having to force you to loose > fresh object as you continue to build test on the other tree), I'd do > something like this: we will go with your preferred way - no question about that. > > mkdir ~/tmp > git clone ~/linux/.git linux-dev-test ok I have them in paralled ~/path/linux ~/path/linux-dev-test > > cd linux-dev-test > > Notice how if you do git log and search for > a023748d53c10850650fe86b1c4a7d421d576451 > you'll see that the commit listed before this is > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > ("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") > > To be clear the list of commits you typically would see is just: > > a023748d53c10850650fe86b1c4a7d421d576451 - Merge branch 'x86-mm-for-linus' > of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 - Merge branches > 'x86-platform-for-linus' and 'x86-uv-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > We want to go down into the commits in the merge commit a023748d53c and > then zero out exactly which commit caused the issue. To do that on your > linux-dev-test directory you can do this: Thank you for the explanations. I thing I had understood that bit. git bisect visualized (gitk) helped me to grasp it. git log gave me a hard time with all these "hidden commits". Confirmation is good. > > git checkout -b test-merge-commit a023748d53c10850650fe86b1c4a7d421d576451 > > That will create branch for testing based on the merge commit. > Now do this: > > git rebase -i 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > > Then don't pick any commit, just save and exit the editor, and then > git will actually "unfold" the merge commit for you -- it magically > will apply each commit in that merge commit linearly into your git > history. > > For instance the rebase should show 22 commits as follows, just > leave the defaults set as in bewlow and just hit (ESCT + :wq if > in vi): > > pick 96e70f832856 x86/mm: Avoid overlap the fixmap area on i386 > pick 63e7b6d90c1e x86: mm: Re-use the early_ioremap fixed area > pick bdee237c0343 x86: mm: Use 2GB memory block size on large-memory > x86-64 systems > pick 281d4078bec3 x86: Make page cache mode a real type > pick c27ce0af896b x86: Use new cache mode type in include/asm/fb.h > pick 2d85ebf8e12e x86: Use new cache mode type in > drivers/video/fbdev/gbefb.c > pick 5006e45a6bc2 x86: Use new cache mode type in > drivers/video/fbdev/vermilion > pick 1c64216be164 x86: Use new cache mode type in arch/x86/pci > pick 2df58b6d3530 x86: Use new cache mode type in arch/x86/mm/init_64.c > pick d85f33342a0f x86: Use new cache mode type in asm/pgtable.h > pick 49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c > pick 2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and > track_pfn_insert() > pick 102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in > pageattr.c > pick c06814d8419a x86: Use new cache mode type in setting page attributes > pick b14097bd911c x86: Use new cache mode type in mm/ioremap.c > pick e00c8cc93c1a x86: Use new cache mode type in memtype related > functions > pick 87ad0b713b10 x86: Clean up pgtable_types.h > pick f439c429c320 x86: Support PAT bit in pagetable dump for lower levels > pick f5b2831d6541 x86: Respect PAT bit when copying pte values between > large and normal pages > pick bd809af16e3a x86: Enable PAT to use cache mode translation tables > pick 47591df50512 xen: Support Xen pv-domains using PAT > pick 0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c > Ok I will do later tonight. But from my (git bisect) logs what I was expecting was # only skipped commits left to test # possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip # possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT only functions to mm/pat.c # possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen pv-domains using PAT # possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to use cache mode translation tables # possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit when copying pte values between large and normal pages # possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit in pagetable dump for lower levels # possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up pgtable_types.h # possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache mode type in memtype related functions # possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache mode type in mm/ioremap.c # possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache mode type in setting page attributes # possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c # possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() commit a023748d53c10850650fe86b1c4a7d421d576451 contains all the other commits listed below. The order is that newest is higher Note that these commits listed above are untestable because the resulting kernels are not bootable. They hang in the second line of boot output in "Loading Ramdisk..." or something similar. my last good commit was 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 that means with git bisect I had already started zeroing in in a023748d53c10850650fe86b1c4a7d421d576451 since 49a3b3... was part of a0237... So based on my git bisect so far my understanding is last good merge commit: 773fed910d41e443e495a6bfa9ab1c2b7b13e012 last bad merge commit (next after 773fed...): a023748d53c10850650fe86b1c4a7d421d576451 last good commit (inside a023748d53c10850650fe86b1c4a7d421d576451): 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 all the others from 49a3b3cbdf1... to a023748d53c1... are untestable/unbootable kernels. Please correct me if I am wrong - it will help me build the correct mental model. > You should see: > > Successfully rebased and updated refs/heads/test-merge-commit. > > Now if you do git log you will see the above commits in linear > atomic history. You can now bisect this merge commit atomically, so do: > > git bisect 099487de0934e3d5e326666914a426af89a0868b > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > > Note that this assumes that the commit prior to the merge commit is fine. > Is this true, can you confirm? (git checkout -b test-prior-merge-gtest > 773fed910d4, > build and see if it doesn't break there) > > If we know for sure 773fed910d4 did not break anything then the above > bisect > should let us zero in on the exact atomic commit ID that caused the issue. > Now the problem is that I tried twice to verify that 773fed910d41e443e495a6bfa9ab1c2b7b13e012 is indeed a good commit and I ended up with an unbootable kernel (hangs in "Loading Ramdisk..."). This is very disappointing and means that all my bisections so far are invalid. Very disappointing indeed but it's only a setback. I will figure it out and will make sure I have a valid setup for reproducible tests before I bother you again. Just for the record I did $git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012 $fakeroot make -j 4 CC=gcc-4.8 deb-pkg I will do as you suggest with the unfold of commits - but if my bisection was right (serious hints to the opposite exist) I stopped on unbootable/untestable kernels Thanks for the exhaustive mails with the explanations and the tips. They are much appreciated. Vassilis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support 2015-11-24 9:36 ` vasvir @ 2015-11-24 22:03 ` Luis R. Rodriguez 0 siblings, 0 replies; 22+ messages in thread From: Luis R. Rodriguez @ 2015-11-24 22:03 UTC (permalink / raw) To: vasvir; +Cc: Juergen Gross, linux-kernel, Toshi Kani, mcgrof On Tue, Nov 24, 2015 at 11:36:54AM +0200, vasvir@iit.demokritos.gr wrote: > > Let's try to speed up reproducing this. > > > > I have a hunch perhaps this might be related to some BIOS controlled > > MTRRs and a mismatch which then enables the kernel to think that a type > > of MTRR write might be OK, but in fact its not. Due to the work load > > description of this perhaps this could be related to fan control and BIOS > > control on them and against some other device MTRR. More on this suspicion > > on another thread where you provide more logs. > > > > On a kernel that you know fails can you try replacing this work load by > > making > > you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building > > for 2, > > 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if > > making the CPU fan trigger would accelerate the issue. If 'make -j' is > > too nuts > > to the point you can't even CTRL C it, try 'make -j 16' . Note that if > > this is > > true then that means a hot CPU could still trigger CPU fan controls on on > > a > > fresh boot if the previous boot was CPU intensive. > > OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to > reproduce it in the second hibernate/resume cycle. Great, glad we could reduce the amount of time to reproduce to what seems to be a few minutes now. > Here is what I did in my own words so you can spot inconsistencies. > > I started a kernel compile with make -j 32. My computer was very > responsive which is an impressive feat by the way. > In a second tab in my Konsole (I am running KDE) I run $watch sensors. I > watched the temperature of the cores to go from 38 to ~70 and the cpu fan > from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the > compilation and hibernated from the KDE. I always hibernate from the KDE > start menu. Previously I had made some tests where I was hibernating from > the VT console (although sddm may was running in VT7) and I have managed > to reproduce it - so (in my mind) it was not graphics mode specific. From > that point I am always hibernating from KDE. Come to think of it, the mtrr_add() and/or ioremap_wc() calls would be triggered on driver initialization, that is on probe / boot time, so if this issue you are running into is a clash of the BIOS's own notion of what is set for an MTRR type and later another driver's desired MTRR desired type (or equivalent PAT type) then the issue could be triggered just with boot time / hibernation / resume time without much interaction at least on the graphics front. > The first time it worked. For the second time I thought - why to hit > Ctrl+C let's try to hibernate with the compilation running - and it > failed. OK. How long did you leave the machine on idle before resuming? Can you try on a fresh boot to bring up temperature to ~70 and while its still compiling hibernate and see if that triggers it ? If we can reduce it to only one hibernate that should reduce time to troubleshoot, it is also just puzzling you'd need to hibernate twice to reproduce this issue. > Now I don't know if it failed because it was the second cycle or > because the load of the compilation was there or because of the > temperature controlled fan register you mentioned. If its fan related one test could be to hibertane on a fresh boot once fan control is one, let it sit to cool, and then resume. Vs just resuming right away. Ie: determine if we need fan control to be idle upon resume or not, also how many times does fan control have to go on / off before you can reproduce. > Then I repeated the test with a known good kernel 3.18 (which should be > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs - > I have a problem there - see below) and it survived the same test > (hibernate two times with temperature being ~70). > > > > If this doesn't do it lets try forcing an MTRR capable driver, say > > graphics is > > the obvious target, try perhaps some 3D stuff or a screen saver prior to > > hibernation. Note that even if you boot nomtrr the BIOS may still use > > MTRRs, > > and PAT use on Linux could assume MTRR is not being used on drivers but > > the > > BIOS may still do something behind the scenes. This is actually one reason > > why > > we can't exactly remove MTRR support from Linux, since the BIOS may still > > do > > some wacky stuff with MTRRs, one example of such I was given was CPU can > > control might use WC MTRRs, so the kernel must be aware of this, even if > > no > > MTRRs are ever used on the Linux kernel at all -- this is the case now as > > of > > v4.3 and onwards. > > > > If that doesn't help speed it up , maybe try both screen saver + some 3D > > stuff + cpu instensive stuff. > > I have 3D effects enabled in my KDE. Since your tip succeed to reproduce > the problem early I didn't bother but If I should test 3D which program / > benchmark should I run? glxgears? As I mentioned above I can't think now of a reason why this should trigger the issue if its mtrr related. > > To help you speed up testing you can try reducing your build time by > > reducing > > the amount of crap you have to build: > > > > make localmodconfig > > > > That should only build things your kernel has loaded as modules or is > > already > > enabled (=y). > > > > Thanks for the tip. I don't want to change that right now. I don't mind > waiting a little bit because I a get a deb with the kernel and can retest > a known configuration. There is little risk in using it, you'll keep everything you had enabled as built-in or modules. The only issue with this is if in between the commits there was a kconfig symbol rename (driver rename), but I really don't expect you to run into this as an issue. > The other tip you gave if it actually works as it > looks like working would give a great boost to the debugging cycle to > actually make me the bottleneck. Sure. > > That is commit a023748d53c10850650fe86b1c4a7d421d576451 > > ("Merge branch 'x86-mm-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") > > > > Git is smart enough to tell you you've hit a merge commit and that all the > > possible commits on that merge could be the issue. This is why you bisect > > log shows a slew of commits. The next step is to bisect through the merge > > and then bisect through that, this will then let us identify the exact > > commit > > that may have caused the issue. > > > > There are a few ways to do this, my preferred way is to "unfold" a merge > > commit manually. > > > > To help keep thing separately (without affecting other tests you might > > have on your other git tree and to avoid having to force you to loose > > fresh object as you continue to build test on the other tree), I'd do > > something like this: > > we will go with your preferred way - no question about that. > > > > > mkdir ~/tmp > > git clone ~/linux/.git linux-dev-test > > ok I have them in paralled ~/path/linux ~/path/linux-dev-test > > > > > cd linux-dev-test > > > > Notice how if you do git log and search for > > a023748d53c10850650fe86b1c4a7d421d576451 > > you'll see that the commit listed before this is > > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > > ("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") > > > > To be clear the list of commits you typically would see is just: > > > > a023748d53c10850650fe86b1c4a7d421d576451 - Merge branch 'x86-mm-for-linus' > > of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 - Merge branches > > 'x86-platform-for-linus' and 'x86-uv-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > > > We want to go down into the commits in the merge commit a023748d53c and > > then zero out exactly which commit caused the issue. To do that on your > > linux-dev-test directory you can do this: > > Thank you for the explanations. I thing I had understood that bit. git > bisect visualized (gitk) helped me to grasp it. git log gave me a hard > time with all these "hidden commits". Confirmation is good. > > > > > git checkout -b test-merge-commit a023748d53c10850650fe86b1c4a7d421d576451 > > > > That will create branch for testing based on the merge commit. > > Now do this: > > > > git rebase -i 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > > > > Then don't pick any commit, just save and exit the editor, and then > > git will actually "unfold" the merge commit for you -- it magically > > will apply each commit in that merge commit linearly into your git > > history. > > > > For instance the rebase should show 22 commits as follows, just > > leave the defaults set as in bewlow and just hit (ESCT + :wq if > > in vi): > > > > pick 96e70f832856 x86/mm: Avoid overlap the fixmap area on i386 > > pick 63e7b6d90c1e x86: mm: Re-use the early_ioremap fixed area > > pick bdee237c0343 x86: mm: Use 2GB memory block size on large-memory > > x86-64 systems > > pick 281d4078bec3 x86: Make page cache mode a real type > > pick c27ce0af896b x86: Use new cache mode type in include/asm/fb.h > > pick 2d85ebf8e12e x86: Use new cache mode type in > > drivers/video/fbdev/gbefb.c > > pick 5006e45a6bc2 x86: Use new cache mode type in > > drivers/video/fbdev/vermilion > > pick 1c64216be164 x86: Use new cache mode type in arch/x86/pci > > pick 2df58b6d3530 x86: Use new cache mode type in arch/x86/mm/init_64.c > > pick d85f33342a0f x86: Use new cache mode type in asm/pgtable.h > > pick 49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c Everything below here should be tested given you say 49a3b3cbdf16 is good. > > pick 2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and > > track_pfn_insert() > > pick 102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in > > pageattr.c > > pick c06814d8419a x86: Use new cache mode type in setting page attributes > > pick b14097bd911c x86: Use new cache mode type in mm/ioremap.c > > pick e00c8cc93c1a x86: Use new cache mode type in memtype related > > functions > > pick 87ad0b713b10 x86: Clean up pgtable_types.h > > pick f439c429c320 x86: Support PAT bit in pagetable dump for lower levels > > pick f5b2831d6541 x86: Respect PAT bit when copying pte values between > > large and normal pages > > pick bd809af16e3a x86: Enable PAT to use cache mode translation tables > > pick 47591df50512 xen: Support Xen pv-domains using PAT > > pick 0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c > > > > Ok I will do later tonight. But from my (git bisect) logs what I was > expecting was > > # only skipped commits left to test > # possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451] > Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > # possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9] > x86: mm: Move PAT only functions to mm/pat.c > # possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7] > xen: Support Xen pv-domains using PAT > # possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] > x86: Enable PAT to use cache mode translation tables > # possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c] > x86: Respect PAT bit when copying pte values between large and normal > pages > # possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b] > x86: Support PAT bit in pagetable dump for lower levels > # possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] > x86: Clean up pgtable_types.h > # possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] > x86: Use new cache mode type in memtype related functions > # possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843] > x86: Use new cache mode type in mm/ioremap.c > # possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6] > x86: Use new cache mode type in setting page attributes > # possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8] > x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c > # possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] > x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() You are correct. Since you say 49a3b3cbdf16 was your first last good commit then these are possible bad candidates. > commit a023748d53c10850650fe86b1c4a7d421d576451 contains all the other > commits listed below. The order is that newest is higher Sure. All of those went int v3.19-rc1. > Note that these commits listed above are untestable because the resulting > kernels are not bootable. They hang in the second line of boot output in > "Loading Ramdisk..." or something similar. > > my last good commit was 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 That went in v3.19-rc1 as well. > that > means with git bisect I had already started zeroing in in > a023748d53c10850650fe86b1c4a7d421d576451 since 49a3b3... was part of > a0237... This is correct. I missed that, thanks. That should reduce your bisect to the above commits. We know that aa8f46878ab1a4a4e7b975b8fc8c398981e52986 ("x86: mm: Move PAT only functions to mm/pat.c") was the last commit part of the merge a023748d53c10850650fe86b1c4a7d421d576451, and we know a023748d53c10850650fe86b1c4a7d421d576451 was a bad commit you can now bisect on that tree between x86: se new cache mode type in mm/iomap_32.c and x86: mm: Move PAT only functions to mm/pat.c Note that rebasign will then change your commit IDs so the above two named commits would appear differently on your tree so when relaying information back to us just use the name if working on the rebased tree. Since you know the commit IDs now though you could also just go back to your original tree and bisect between the two commits now part of the same branch: git bisect start a023748d53c10850650fe86b1c4a7d421d576451 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 But for some reason git does a huge bisection here, I get 11 steps with 1814 revisions to test... we know there are only 12 revisions really left to test though, for instance here are my commit IDs once I rebase on the branch commit a023748d53c10850650fe86b1c4a7d421d576451 (as I said notice how the commit IDs are now different): 5e9c2da70692 x86: Use new cache mode type in mm/iomap_32.c e09f7c9da6b7 x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() 7077aded72a2 x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c a019620d98ec x86: Use new cache mode type in setting page attributes 848159761245 x86: Use new cache mode type in mm/ioremap.c 06d664382eea x86: Use new cache mode type in memtype related functions 39ba0907d179 x86: Clean up pgtable_types.h 155c520125fe x86: Support PAT bit in pagetable dump for lower levels f51279d0867f x86: Respect PAT bit when copying pte values between large and normal pages ddbb181ad4ff x86: Enable PAT to use cache mode translation tables 7c67687de764 xen: Support Xen pv-domains using PAT aa8f46878ab1 x86: mm: Move PAT only functions to mm/pat.c So we really only need you test max 3 commits (log2 of 11 =~ 3). So with my commit IDs I'd just do: git bisect start aa8f46878ab1 5e9c2da70692 By rebasing on the commit prior to the merge commit this cuts down bisection it down from 1814 revision to 5 revisions and from roughly 11 steps roughly 3 steps. > So based on my git bisect so far my understanding is > last good merge commit: 773fed910d41e443e495a6bfa9ab1c2b7b13e012 773fed910d41e443e495a6bfa9ab1c2b7b13e012 is the commit prior to the merge commit, we have better information -- its just within the merge commit so we need to trickle in there to look at it. > last bad merge commit (next after 773fed...): > a023748d53c10850650fe86b1c4a7d421d576451 That's the merge commit, but a merge commit is just fluff (meta data to preserve annotations who how queued up code and then tossed it to Linus later), we know the actual last commit that made code changes was 0dbcae884779fdf7e2239a97ac7488877f0693d9 ("x86: mm: Move PAT only functions to mm/pat.c") so we can use that. > last good commit (inside a023748d53c10850650fe86b1c4a7d421d576451): > 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 That's much better, that zeroes inside the merge commit. > all the others from 49a3b3cbdf1... to a023748d53c1... are > untestable/unbootable kernels. To be clear 49a3b3cbdf1 is bootable as its your last good commit. If you are saying that after that and up to the last commit of the merge commit (0dbcae884779) things are not bootable that's a big issue indeed to help bisect this further. If you can boot but it "hangs" on hibernate on the merge commit a023748d53c1 I would suspect you should at least be able to boot into the last commit of the merge 0dbcae884779f, can you confirm? To be clear here are the list of commits we are reviewing: a023748d53c1 - merge commit 0dbcae884779f - last commit of the merge commit 49a3b3cbdf162 - first good commit in the merge Putting names on these: a023748d53c1 - Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 0dbcae884779f - x86: mm: Move PAT only functions to mm/pat.c 49a3b3cbdf162 - x86: Use new cache mode type in mm/iomap_32.c The actual list then of things we need you to find out what caused the issue: git log --oneline 49a3b3cbdf162^1..0dbcae884779f 0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c 47591df50512 xen: Support Xen pv-domains using PAT bd809af16e3a x86: Enable PAT to use cache mode translation tables f5b2831d6541 x86: Respect PAT bit when copying pte values between large and normal pages f439c429c320 x86: Support PAT bit in pagetable dump for lower levels 87ad0b713b10 x86: Clean up pgtable_types.h e00c8cc93c1a x86: Use new cache mode type in memtype related functions b14097bd911c x86: Use new cache mode type in mm/ioremap.c c06814d8419a x86: Use new cache mode type in setting page attributes 102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c 2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() 49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c Juergen is the author of all of these except 0dbcae884779 which is just a code shift, so it sholud not affect run time for you. > Please correct me if I am wrong - it will help me build the correct mental > model. Hope this helps. If there were issues with getting to boot some of the other commits obviously some of the other commits fixed an issue as the merge commit seems bootable -- so perhaps one of these commits is important to fix the bootable issue you noted. Since Juergen is the author of all of the relevant patches and he's been active on this thread I am confident we should be able to get you a bootable kernel so you can help complete the bisection. Luis > > You should see: > > Successfully rebased and updated refs/heads/test-merge-commit. > > > > Now if you do git log you will see the above commits in linear > > atomic history. You can now bisect this merge commit atomically, so do: > > > > git bisect 099487de0934e3d5e326666914a426af89a0868b > > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > > > > Note that this assumes that the commit prior to the merge commit is fine. > > Is this true, can you confirm? (git checkout -b test-prior-merge-gtest > > 773fed910d4, > > build and see if it doesn't break there) > > > > If we know for sure 773fed910d4 did not break anything then the above > > bisect > > should let us zero in on the exact atomic commit ID that caused the issue. > > > > Now the problem is that I tried twice to verify that > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 is indeed a good commit and I > ended up with an unbootable kernel (hangs in "Loading Ramdisk..."). This > is very disappointing and means that all my bisections so far are invalid. > Very disappointing indeed but it's only a setback. I will figure it out > and will make sure I have a valid setup for reproducible tests before I > bother you again. > > Just for the record I did > $git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012 > $fakeroot make -j 4 CC=gcc-4.8 deb-pkg > > I will do as you suggest with the unfold of commits - but if my bisection > was right (serious hints to the opposite exist) I stopped on > unbootable/untestable kernels > > Thanks for the exhaustive mails with the explanations and the tips. They > are much appreciated. > > Vassilis > > > -- Luis Rodriguez, SUSE LINUX GmbH Maxfeldstrasse 5; D-90409 Nuernberg ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2015-11-25 19:25 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-18 21:43 Hibernate resume bug around 3,18-rc2 - Full PAT support Vassilis Virvilis 2015-11-19 5:39 ` Juergen Gross 2015-11-19 7:50 ` vasvir 2015-11-19 9:10 ` Juergen Gross 2015-11-19 20:35 ` Vassilis Virvilis 2015-11-20 5:25 ` Vassilis Virvilis 2015-11-20 8:47 ` Juergen Gross 2015-11-20 10:04 ` vasvir 2015-11-20 12:23 ` Juergen Gross 2015-11-21 11:49 ` Vassilis Virvilis 2015-11-23 7:32 ` Juergen Gross 2015-11-23 14:11 ` vasvir 2015-11-23 14:19 ` Juergen Gross 2015-11-24 22:46 ` Luis R. Rodriguez 2015-11-25 5:01 ` Juergen Gross 2015-11-25 19:24 ` Luis R. Rodriguez 2015-11-23 18:56 ` Luis R. Rodriguez 2015-11-23 23:01 ` Vassilis Virvilis 2015-11-24 22:16 ` Luis R. Rodriguez 2015-11-23 18:48 ` Luis R. Rodriguez 2015-11-24 9:36 ` vasvir 2015-11-24 22:03 ` Luis R. Rodriguez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox