* git pull on linux-next makes my system crawl to its knees and beg for mercy @ 2009-12-18 17:26 Luis R. Rodriguez 2009-12-18 17:38 ` Bartlomiej Zolnierkiewicz 2009-12-18 21:56 ` Stephen Rothwell 0 siblings, 2 replies; 11+ messages in thread From: Luis R. Rodriguez @ 2009-12-18 17:26 UTC (permalink / raw) To: linux-kernel; +Cc: Stephen Rothwell, Bob Copeland I can't describe it any better. It really is pissing me the fuck off, its as if I have invisible elves using my nuts as punching bags. Something is seriously fucked with 2.6.32, my box, or linux-next, or perhaps there is another possibility someone might be able to help enlighten me about which I am not considering. I'd also would love to hear from others and see if I'm not the only one because if this issue is reproducible, it would be bad. First let me describe the issue in detail. I tend to always be on a 2.6.32 kernel + John's queued up patches for wireless for the next kernel release (I use wireless-testing). My system is a Thinkpad T61, userspace is Ubuntu 9.10 based (ships with git 1.6.3.3) and I kept an ext3 filesystem to be able to go back in time to 2.6.27 at will without issues. I git clone'd linux-next a few weeks ago. After a few days I then tried to git pull and my system became completely unusable, It took *ages* to open up a terminal and start running commands. Even ssh'ing into my box became a hassle to the point that my *entire* morning was spent trying to patiently wait for the git pull to finish. I gave up, I don't recall I had anything on my kernel logs. Bewildered with this issue I set out to prove to myself this issue was not a 2.6.32 issue and booted other kernels, including Ubuntu's distro kernel on 2.6.31 and then later my own built fresh 2.6.27.41 kernel. The issue was reproducible on all three kernels! This lead me to believe this was a system / hard drive issue and embraced myself for a system fix. I yet needed to prove this was indeed a system issue. I've been using myself without touching linux-next for a while now and it works flawlessly, and even doing testing with ath5k / ath9k for some random projects I have. I git pull wireless-testing just fine, and pm-suspend just fine every day without any hiccup. I then started to suspect I probably got a fucked linux-next somehow, I do recall I did pm-suspend during a pull of wireless-testing before and never had issues after resume or with the tree at all. I don't recall doing the pm-suspend with linux-next but it could be possible. Since my last giving up on the 'git pull' of linux-next I tried to 'git reset --hard origin' and then trying a 'git pull' but saw my issue easily becoming unusable again, I ctrl-c'd out of that quickly, tried 'git fsck' and did fine some complaints. I started to want to blame my hard drive so I rm -rf'd linux-next and tried a fresh clone. It pulled fine, my system was slow but nothing *that* unusual. A couple of days ago I do a 'git pull' again and ... my system starts crying again, begging me to stop, so I did. My 'git describe' now tells me I'm at next-20091211 and 'git fsck' tells me: dangling tree 3500a4301d572e57c700d18d6730f4ac3e33b923 dangling tree e50022fd1e44c3ca63d57e5b263a8263fa5e291b dangling tree c105e67e2b609e02eefe2b676e53f79b3e375a32 dangling tree 850b60a21ebf9721d16eeb7d68d6e6250893b558 dangling tree dd0bc64a4fe9eac9de3edb0db68d7a83d0477655 dangling tree ac135ee3b2031dcbed733af87c1b82833c1bd035 dangling tree 635a4c8728714746bc1a80692bd7b998af36c7fd dangling tree 08708a50cd385efc23cfda7dd88cacf951db2237 dangling tree d27198a37d7b393ed9f5dc99c2f56e1c715c4572 dangling tree cc753417b1a3c1b61c0f1b37e7560be1f5404b93 dangling tree 237dcc5120bfe3aebcd2e19ab3640fbecb855ff2 dangling tree 32815626af9eb48dbe04fa790b154a9424871041 dangling tree 299294383dc096b0363ae3f7a49fe937a5e6027c dangling tree 18acf23ea04f77d96c5eb092bbed0d598eb580ef dangling tree 08c9e248624d407404e04481adf27d385c3a7e57 dangling tree beccf2344a596fed67cbe0f874210a98bd2b7c40 dangling tree d8df306d4dcf551d47f7d914bd7754c000e541f8 dangling tree 93e83c3ea3a1fef405546af4d99ecd1032ab9b09 dangling tree 65f9e4a9c3af938337fb7ec49eb354ebb19553f5 dangling tree 0a0c81ebe4e60b5941adf92494c45a9cc4ebbd85 dangling tree 5e1b8d853b32c4ee11cf4ef142be4d9a3096f679 dangling tree e11b176056fb40d5ef9cae77af37f53d7bf9342d dangling tree 8a1e91d681554cafc74586c7ac3eb77299fbb091 dangling tree 10201bc2b2ceac311424bdbc3949a726926a6a3d dangling tree fa21357016b62184d347f16f22fce54a0fc3aef5 dangling tree e33177806b80d35b0547a76e5fe26b59c55b5aa1 dangling tree cf3a11b7f1b83c86870881cb40f7c1af5b1daf9a dangling tree e43d57bb730a492b73f6e6a8e5fd218d14e4b741 dangling tree 9f62ff527891032a5f0511f8f38cfe686b15ee5f dangling tree 7476a37bdcedc8ecc568d73225a55b59899e70df dangling tree 6085f588e0656a48e76f5e87dfb5ca03e7649bc4 dangling tree ea93955aa22995d17cc90f300343a535c8bdbf0c dangling tree 1d95edfaf4d9065bf86cae97629fa28dd76e9fd3 dangling tree f29a99c8fee39c9934cca06a00e7ca5b48437ca0 dangling tree ff9bcf59a9392f7856a13c7541107165d8eb5659 dangling tree 3d9e29ac71b065829550996559094525e6f4ea4e dangling tree cea7adf5352ce365f580066d1e2123e63b48f261 dangling tree 2fa8f7cd3a02839cf41ae7b01267047ffdbcfbe5 dangling tree 42ddc9e3585548880386d48dd7393d4111347ccd dangling tree 83e30d38fb1e8ca59440f830f9bc203c615eff49 dangling tree 71e83d06ef0bb5bb105cf64bbb80bc6580bc06eb dangling tree 7bedcf303aafcdedd6d0b3119bd8040d1fab3983 dangling tree 83f1a1eae41b9d6c7d2b6d549c35485a4e20847a dangling tree eff8c7a202e29a8793b581b4c1b8d372a5289356 dangling tree 84f9bb0e7549269370c88a0878655fa4f5c09b27 dangling tree 4ffa81c9721df067d741fbd40227163d77ff7513 I'm starting to doubt this is a hard drive issue, I will be cloning linux-next as-is exactly on my system on some other T61 (but a little bigger and with Nvidia graphics) I have by git clone'ing over ssh to my linux-next/.git/ and then I'll scp over my linux-next/.git/config to it and try a git pull and see if that system also goes ape shit. I am wondering if others have experiences issues like this as well. Here's my kernel config for wireless-testing: http://bombadil.infradead.org/~mcgrof/configs/2009/12/wireless-testing.config And my config for 2.6.27: http://bombadil.infradead.org/~mcgrof/configs/2009/12/2.6.27.41.config Even if a git tree gets terribly messed up the issues I'm seeing seem to painful for an average user to experience, there has got to be something major going on under the hood, and not sure why I don't see this sort of thing with following wireless-testing. Luis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 17:26 git pull on linux-next makes my system crawl to its knees and beg for mercy Luis R. Rodriguez @ 2009-12-18 17:38 ` Bartlomiej Zolnierkiewicz 2009-12-18 19:19 ` Luis R. Rodriguez 2009-12-18 21:56 ` Stephen Rothwell 1 sibling, 1 reply; 11+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2009-12-18 17:38 UTC (permalink / raw) To: Luis R. Rodriguez; +Cc: linux-kernel, Stephen Rothwell, Bob Copeland On Friday 18 December 2009 06:26:29 pm Luis R. Rodriguez wrote: > on my kernel logs. Bewildered with this issue I set out to prove to > myself this issue was not a 2.6.32 issue and booted other kernels, > including Ubuntu's distro kernel on 2.6.31 and then later my own built > fresh 2.6.27.41 kernel. The issue was reproducible on all three > kernels! > > This lead me to believe this was a system / hard drive issue and > embraced myself for a system fix. I yet needed to prove this was Just some hints for ruling out the system / hard drive problem. smartctl -a /dev/sdx is your friend for checking your disk (keep an eye on anything suspicious like re-allocated sector count going up etc.) It could be also fs related issue that shows up only under specific conditions (i.e. almost full partition -- some file-systems starts to crawl when the amount of available free space gets low). HTH -- Bartlomiej Zolnierkiewicz ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 17:38 ` Bartlomiej Zolnierkiewicz @ 2009-12-18 19:19 ` Luis R. Rodriguez 2009-12-18 19:55 ` Luis R. Rodriguez 0 siblings, 1 reply; 11+ messages in thread From: Luis R. Rodriguez @ 2009-12-18 19:19 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz; +Cc: linux-kernel, Stephen Rothwell, Bob Copeland On Fri, Dec 18, 2009 at 9:38 AM, Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote: > On Friday 18 December 2009 06:26:29 pm Luis R. Rodriguez wrote: > >> on my kernel logs. Bewildered with this issue I set out to prove to >> myself this issue was not a 2.6.32 issue and booted other kernels, >> including Ubuntu's distro kernel on 2.6.31 and then later my own built >> fresh 2.6.27.41 kernel. The issue was reproducible on all three >> kernels! >> >> This lead me to believe this was a system / hard drive issue and >> embraced myself for a system fix. I yet needed to prove this was > > Just some hints for ruling out the system / hard drive problem. > > smartctl -a /dev/sdx is your friend for checking your disk (keep an eye > on anything suspicious like re-allocated sector count going up etc.) Sweet thanks, here's my current output, I'll try later after I get some day work done to pull linux-next and make it moan. Let me know if you see anything fishy. smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: HITACHI HTS722010K9SA00 Serial Number: 080109DP0210DPG8DUEP Firmware Version: DC2ZC75A User Capacity: 100,030,242,816 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3f Local Time is: Fri Dec 18 11:16:12 2009 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 645) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 39) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0 2 Throughput_Performance 0x0005 116 116 040 Pre-fail Offline - 3380 3 Spin_Up_Time 0x0007 253 253 033 Pre-fail Always - 0 4 Start_Stop_Count 0x0012 098 098 000 Old_age Always - 3314 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 128 128 040 Pre-fail Offline - 29 9 Power_On_Hours 0x0012 081 081 000 Old_age Always - 8401 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1571 191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 65536 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 3932351 193 Load_Cycle_Count 0x0012 045 045 000 Old_age Always - 559592 194 Temperature_Celsius 0x0002 134 134 000 Old_age Always - 41 (Lifetime Min/Max 13/48) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 223 Load_Retry_Count 0x000a 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 8 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 8 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 01 9f 45 a5 e0 Error: IDNF at LBA = 0x00a5459f = 10831263 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 ff 01 9f 45 a5 e0 00 00:04:06.200 READ SECTOR(S) EXT 25 ff 01 9f 45 a5 e0 00 00:04:06.100 READ DMA EXT 34 ff 01 00 00 00 e0 00 00:04:04.100 WRITE SECTORS(S) EXT 25 ff 01 00 00 00 e0 00 00:04:04.100 READ DMA EXT 25 ff 01 c0 17 fa e0 00 00:04:04.100 READ DMA EXT Error 7 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 01 9f 45 a5 e0 Error: IDNF 1 sectors at LBA = 0x00a5459f = 10831263 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 ff 01 9f 45 a5 e0 00 00:04:06.100 READ DMA EXT 34 ff 01 00 00 00 e0 00 00:04:04.100 WRITE SECTORS(S) EXT 25 ff 01 00 00 00 e0 00 00:04:04.100 READ DMA EXT 25 ff 01 c0 17 fa e0 00 00:04:04.100 READ DMA EXT 25 ff 01 3f 00 00 e0 00 00:04:04.100 READ DMA EXT Error 6 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 01 9f 45 a5 e0 Error: IDNF at LBA = 0x00a5459f = 10831263 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 ff 01 9f 45 a5 e0 00 00:04:04.000 READ SECTOR(S) EXT 25 ff 01 9f 45 a5 e0 00 00:04:04.000 READ DMA EXT 34 ff 01 00 00 00 e0 00 00:04:02.000 WRITE SECTORS(S) EXT 35 ff 01 cf 17 fa e0 00 00:04:02.000 WRITE DMA EXT 35 ff 01 ce 17 fa e0 00 00:04:02.000 WRITE DMA EXT Error 5 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 01 9f 45 a5 e0 Error: IDNF 1 sectors at LBA = 0x00a5459f = 10831263 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 ff 01 9f 45 a5 e0 00 00:04:04.000 READ DMA EXT 34 ff 01 00 00 00 e0 00 00:04:02.000 WRITE SECTORS(S) EXT 35 ff 01 cf 17 fa e0 00 00:04:02.000 WRITE DMA EXT 35 ff 01 ce 17 fa e0 00 00:04:02.000 WRITE DMA EXT 35 ff 01 cd 17 fa e0 00 00:04:02.000 WRITE DMA EXT Error 4 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 01 9f 45 a5 e0 Error: IDNF at LBA = 0x00a5459f = 10831263 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 ff 01 9f 45 a5 e0 00 00:04:01.900 READ SECTOR(S) EXT 25 ff 01 9f 45 a5 e0 00 00:04:01.800 READ DMA EXT 34 ff 01 00 00 00 e0 00 00:03:59.900 WRITE SECTORS(S) EXT 35 ff 01 4e 00 00 e0 00 00:03:59.900 WRITE DMA EXT 35 ff 01 4d 00 00 e0 00 00:03:59.900 WRITE DMA EXT SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Also available at: http://bombadil.infradead.org/~mcgrof/logs/2009/12/smart-ctl-sda2.txt > It could be also fs related issue that shows up only under specific > conditions OK -- I see, I used a fresh new ext3, did not make the jump to ext4. > (i.e. almost full partition -- some file-systems starts to > crawl when the amount of available free space gets low). Got it, thanks, so partition has a lot of room. mcgrof@tux ~ $ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 91G 43G 44G 50% / Also ony have one partition. Luis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 19:19 ` Luis R. Rodriguez @ 2009-12-18 19:55 ` Luis R. Rodriguez 2009-12-18 20:51 ` Luis R. Rodriguez 0 siblings, 1 reply; 11+ messages in thread From: Luis R. Rodriguez @ 2009-12-18 19:55 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz; +Cc: linux-kernel, Stephen Rothwell, Bob Copeland On Fri, Dec 18, 2009 at 11:19 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: > On Fri, Dec 18, 2009 at 9:38 AM, Bartlomiej Zolnierkiewicz > <bzolnier@gmail.com> wrote: >> On Friday 18 December 2009 06:26:29 pm Luis R. Rodriguez wrote: >> >>> on my kernel logs. Bewildered with this issue I set out to prove to >>> myself this issue was not a 2.6.32 issue and booted other kernels, >>> including Ubuntu's distro kernel on 2.6.31 and then later my own built >>> fresh 2.6.27.41 kernel. The issue was reproducible on all three >>> kernels! >>> >>> This lead me to believe this was a system / hard drive issue and >>> embraced myself for a system fix. I yet needed to prove this was >> >> Just some hints for ruling out the system / hard drive problem. >> >> smartctl -a /dev/sdx is your friend for checking your disk (keep an eye >> on anything suspicious like re-allocated sector count going up etc.) > > Sweet thanks, here's my current output, I'll try later after I get > some day work done to pull linux-next and make it moan. Let me know if > you see anything fishy. <-- snip full log --> > Also available at: > > http://bombadil.infradead.org/~mcgrof/logs/2009/12/smart-ctl-sda2.txt > >> It could be also fs related issue that shows up only under specific >> conditions > > OK -- I see, I used a fresh new ext3, did not make the jump to ext4. > >> (i.e. almost full partition -- some file-systems starts to >> crawl when the amount of available free space gets low). > > Got it, thanks, so partition has a lot of room. > > mcgrof@tux ~ $ df -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda2 91G 43G 44G 50% / > > Also ony have one partition. GSmartControl is very cool, just ran the short self test and it passed without issues. I'll now run the extended self tests. I'll not that right after the self test I had to checkout the 2.6.32.y branch on hpa's tree and noticed similar type of slow down as I did with pulling linux-next. Only thing with linux-next is it takes ages complete which just makes waiting unbearable. This all makes me suspect its something else. But lets seee what these results on the GSmartControl yield. Luis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 19:55 ` Luis R. Rodriguez @ 2009-12-18 20:51 ` Luis R. Rodriguez 2009-12-18 21:13 ` Luis R. Rodriguez 0 siblings, 1 reply; 11+ messages in thread From: Luis R. Rodriguez @ 2009-12-18 20:51 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz; +Cc: linux-kernel, Stephen Rothwell, Bob Copeland On Fri, Dec 18, 2009 at 11:55 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: > On Fri, Dec 18, 2009 at 11:19 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: >> On Fri, Dec 18, 2009 at 9:38 AM, Bartlomiej Zolnierkiewicz >> <bzolnier@gmail.com> wrote: >>> On Friday 18 December 2009 06:26:29 pm Luis R. Rodriguez wrote: >>> >>>> on my kernel logs. Bewildered with this issue I set out to prove to >>>> myself this issue was not a 2.6.32 issue and booted other kernels, >>>> including Ubuntu's distro kernel on 2.6.31 and then later my own built >>>> fresh 2.6.27.41 kernel. The issue was reproducible on all three >>>> kernels! >>>> >>>> This lead me to believe this was a system / hard drive issue and >>>> embraced myself for a system fix. I yet needed to prove this was >>> >>> Just some hints for ruling out the system / hard drive problem. >>> >>> smartctl -a /dev/sdx is your friend for checking your disk (keep an eye >>> on anything suspicious like re-allocated sector count going up etc.) >> >> Sweet thanks, here's my current output, I'll try later after I get >> some day work done to pull linux-next and make it moan. Let me know if >> you see anything fishy. > > <-- snip full log --> > >> Also available at: >> >> http://bombadil.infradead.org/~mcgrof/logs/2009/12/smart-ctl-sda2.txt >> >>> It could be also fs related issue that shows up only under specific >>> conditions >> >> OK -- I see, I used a fresh new ext3, did not make the jump to ext4. >> >>> (i.e. almost full partition -- some file-systems starts to >>> crawl when the amount of available free space gets low). >> >> Got it, thanks, so partition has a lot of room. >> >> mcgrof@tux ~ $ df -h >> Filesystem Size Used Avail Use% Mounted on >> /dev/sda2 91G 43G 44G 50% / >> >> Also ony have one partition. > > GSmartControl is very cool, just ran the short self test and it passed > without issues. I'll now run the extended self tests. I'll not that > right after the self test I had to checkout the 2.6.32.y branch on > hpa's tree and noticed similar type of slow down as I did with pulling > linux-next. Only thing with linux-next is it takes ages complete which > just makes waiting unbearable. This all makes me suspect its something > else. But lets seee what these results on the GSmartControl yield. I tested the same exact git pull on the other T61 laptop I have and was able to see the same crippling effects but not as bad as with my main T61. Different between them is the one where I see the worst issue has a Intel(R) Core(TM)2 Duo CPU T8100 @ 1.80GHz while the other one has the same CPU but at 2.10GHz. The only thing I see different between linux-next and say wireless-testing is linux-next will have a lot more newer objects and the pull will end with a git merge that will fail and require you to 'git reset --hard origin'. The later part shouldn't be taken into the equation there though as I see the issue creeping up early on during the pull, while git is counting objects and even later compressing. I'm starting to glare at CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y with suspicious looks. On both boxes the CPU kept itself @ 800 MHz during most of the git pull, I did see the CPU idle hitting 0 frequently and the CPU wait time ~ 20 or 30. My GSmartControl extensive test is almost done. I'll test 2.6.33-rc1 once John gets it into his tree. Luis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 20:51 ` Luis R. Rodriguez @ 2009-12-18 21:13 ` Luis R. Rodriguez 0 siblings, 0 replies; 11+ messages in thread From: Luis R. Rodriguez @ 2009-12-18 21:13 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz; +Cc: linux-kernel, Stephen Rothwell, Bob Copeland On Fri, Dec 18, 2009 at 12:51 PM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: > On Fri, Dec 18, 2009 at 11:55 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: >> On Fri, Dec 18, 2009 at 11:19 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: >>> On Fri, Dec 18, 2009 at 9:38 AM, Bartlomiej Zolnierkiewicz >>> <bzolnier@gmail.com> wrote: >>>> On Friday 18 December 2009 06:26:29 pm Luis R. Rodriguez wrote: >>>> >>>>> on my kernel logs. Bewildered with this issue I set out to prove to >>>>> myself this issue was not a 2.6.32 issue and booted other kernels, >>>>> including Ubuntu's distro kernel on 2.6.31 and then later my own built >>>>> fresh 2.6.27.41 kernel. The issue was reproducible on all three >>>>> kernels! >>>>> >>>>> This lead me to believe this was a system / hard drive issue and >>>>> embraced myself for a system fix. I yet needed to prove this was >>>> >>>> Just some hints for ruling out the system / hard drive problem. >>>> >>>> smartctl -a /dev/sdx is your friend for checking your disk (keep an eye >>>> on anything suspicious like re-allocated sector count going up etc.) >>> >>> Sweet thanks, here's my current output, I'll try later after I get >>> some day work done to pull linux-next and make it moan. Let me know if >>> you see anything fishy. >> >> <-- snip full log --> >> >>> Also available at: >>> >>> http://bombadil.infradead.org/~mcgrof/logs/2009/12/smart-ctl-sda2.txt >>> >>>> It could be also fs related issue that shows up only under specific >>>> conditions >>> >>> OK -- I see, I used a fresh new ext3, did not make the jump to ext4. >>> >>>> (i.e. almost full partition -- some file-systems starts to >>>> crawl when the amount of available free space gets low). >>> >>> Got it, thanks, so partition has a lot of room. >>> >>> mcgrof@tux ~ $ df -h >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/sda2 91G 43G 44G 50% / >>> >>> Also ony have one partition. >> >> GSmartControl is very cool, just ran the short self test and it passed >> without issues. I'll now run the extended self tests. I'll not that >> right after the self test I had to checkout the 2.6.32.y branch on >> hpa's tree and noticed similar type of slow down as I did with pulling >> linux-next. Only thing with linux-next is it takes ages complete which >> just makes waiting unbearable. This all makes me suspect its something >> else. But lets seee what these results on the GSmartControl yield. > > I tested the same exact git pull on the other T61 laptop I have and > was able to see the same crippling effects but not as bad as with my > main T61. Different between them is the one where I see the worst > issue has a Intel(R) Core(TM)2 Duo CPU T8100 @ 1.80GHz while the > other one has the same CPU but at 2.10GHz. The only thing I see > different between linux-next and say wireless-testing is linux-next > will have a lot more newer objects and the pull will end with a git > merge that will fail and require you to 'git reset --hard origin'. The > later part shouldn't be taken into the equation there though as I see > the issue creeping up early on during the pull, while git is counting > objects and even later compressing. > > I'm starting to glare at CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y > with suspicious looks. On both boxes the CPU kept itself @ 800 MHz > during most of the git pull, I did see the CPU idle hitting 0 > frequently and the CPU wait time ~ 20 or 30. > > My GSmartControl extensive test is almost done. The test completed without any errors. > I'll test 2.6.33-rc1 once John gets it into his tree. Now to wait for this guy. Luis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 17:26 git pull on linux-next makes my system crawl to its knees and beg for mercy Luis R. Rodriguez 2009-12-18 17:38 ` Bartlomiej Zolnierkiewicz @ 2009-12-18 21:56 ` Stephen Rothwell 2009-12-18 22:03 ` Luis R. Rodriguez 1 sibling, 1 reply; 11+ messages in thread From: Stephen Rothwell @ 2009-12-18 21:56 UTC (permalink / raw) To: Luis R. Rodriguez; +Cc: linux-kernel, Bob Copeland [-- Attachment #1: Type: text/plain, Size: 1264 bytes --] Hi Luis, On Fri, 18 Dec 2009 09:26:29 -0800 "Luis R. Rodriguez" <mcgrof@gmail.com> wrote: > > I tend to always be on a 2.6.32 kernel + John's queued up patches for > wireless for the next kernel release (I use wireless-testing). My > system is a Thinkpad T61, userspace is Ubuntu 9.10 based (ships with > git 1.6.3.3) and I kept an ext3 filesystem to be able to go back in > time to 2.6.27 at will without issues. I git clone'd linux-next a few > weeks ago. After a few days I then tried to git pull and my system > became completely unusable, It took *ages* to open up a terminal and The start of the daily linux-next boilerplate says: > If you are tracking the linux-next tree using git, you should not use > "git pull" to do so as that will try to merge the new linux-next release > with the old one. You should use "git fetch" as mentioned in the FAQ on > the wiki (see below). (Unfortunately, the wiki seems to be unavailable at the moment) I am guessing that the merge that git is attempting is killing your laptop (though besides the number of common commits I am not sure why). Please try using "get fetch" instead. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 21:56 ` Stephen Rothwell @ 2009-12-18 22:03 ` Luis R. Rodriguez 2009-12-23 15:27 ` Denys Vlasenko 0 siblings, 1 reply; 11+ messages in thread From: Luis R. Rodriguez @ 2009-12-18 22:03 UTC (permalink / raw) To: Stephen Rothwell; +Cc: linux-kernel, Bob Copeland On Fri, Dec 18, 2009 at 1:56 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: > Hi Luis, > > On Fri, 18 Dec 2009 09:26:29 -0800 "Luis R. Rodriguez" <mcgrof@gmail.com> wrote: >> >> I tend to always be on a 2.6.32 kernel + John's queued up patches for >> wireless for the next kernel release (I use wireless-testing). My >> system is a Thinkpad T61, userspace is Ubuntu 9.10 based (ships with >> git 1.6.3.3) and I kept an ext3 filesystem to be able to go back in >> time to 2.6.27 at will without issues. I git clone'd linux-next a few >> weeks ago. After a few days I then tried to git pull and my system >> became completely unusable, It took *ages* to open up a terminal and > > The start of the daily linux-next boilerplate says: > >> If you are tracking the linux-next tree using git, you should not use >> "git pull" to do so as that will try to merge the new linux-next release >> with the old one. You should use "git fetch" as mentioned in the FAQ on >> the wiki (see below). > > (Unfortunately, the wiki seems to be unavailable at the moment) > > I am guessing that the merge that git is attempting is killing your > laptop (though besides the number of common commits I am not sure why). > Please try using "get fetch" instead. Indeed, I learned my lesson now. Thanks for the details. Now granted, even if 'git merge' is killing my laptop due to the conflicts of the insane merge I was trying to do it *still* should not make my box completely unresponsive for so long. And given that I'm using mostly distribution specific kernel config options and my have ruled out my hard drive it seems a general serious kernel issue even down to 2.6.27. Whatever git is doing I'm sure other userspace software can also end up generating and would make any user go completely bananas. I was about to rip my hair out. Luis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-18 22:03 ` Luis R. Rodriguez @ 2009-12-23 15:27 ` Denys Vlasenko 2009-12-23 16:20 ` Luis R. Rodriguez 0 siblings, 1 reply; 11+ messages in thread From: Denys Vlasenko @ 2009-12-23 15:27 UTC (permalink / raw) To: Luis R. Rodriguez; +Cc: Stephen Rothwell, linux-kernel, Bob Copeland On Fri, Dec 18, 2009 at 11:03 PM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: > On Fri, Dec 18, 2009 at 1:56 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: >> Hi Luis, >> >> On Fri, 18 Dec 2009 09:26:29 -0800 "Luis R. Rodriguez" <mcgrof@gmail.com> wrote: >>> >>> I tend to always be on a 2.6.32 kernel + John's queued up patches for >>> wireless for the next kernel release (I use wireless-testing). My >>> system is a Thinkpad T61, userspace is Ubuntu 9.10 based (ships with >>> git 1.6.3.3) and I kept an ext3 filesystem to be able to go back in >>> time to 2.6.27 at will without issues. I git clone'd linux-next a few >>> weeks ago. After a few days I then tried to git pull and my system >>> became completely unusable, It took *ages* to open up a terminal and >> >> The start of the daily linux-next boilerplate says: >> >>> If you are tracking the linux-next tree using git, you should not use >>> "git pull" to do so as that will try to merge the new linux-next release >>> with the old one. You should use "git fetch" as mentioned in the FAQ on >>> the wiki (see below). >> >> (Unfortunately, the wiki seems to be unavailable at the moment) >> >> I am guessing that the merge that git is attempting is killing your >> laptop (though besides the number of common commits I am not sure why). >> Please try using "get fetch" instead. > > Indeed, I learned my lesson now. Thanks for the details. > > Now granted, even if 'git merge' is killing my laptop due to the > conflicts of the insane merge I was trying to do it *still* should not > make my box completely unresponsive for so long. And given that I'm > using mostly distribution specific kernel config options and my have > ruled out my hard drive it seems a general serious kernel issue even > down to 2.6.27. Whatever git is doing I'm sure other userspace > software can also end up generating and would make any user go > completely bananas. I was about to rip my hair out. Git gurus would know it by heart, but I am not one. So if I were you, I would just do a generic diagnostic run. What is it git is doing so that machine slows down that much? Is it spawning a lot of running processes? Is it allocating/using so much memory that your box goes into a severe swap storm? I guess it is the latter. If it is, then it's not a kernel problem - kernel can't magically make your system adequately handle a workload which needs 3 GB for working set when the box only has 2 GB of RAM. It _will_ be very slow. -- vda ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-23 15:27 ` Denys Vlasenko @ 2009-12-23 16:20 ` Luis R. Rodriguez 2009-12-23 17:11 ` Denys Vlasenko 0 siblings, 1 reply; 11+ messages in thread From: Luis R. Rodriguez @ 2009-12-23 16:20 UTC (permalink / raw) To: Denys Vlasenko; +Cc: Stephen Rothwell, linux-kernel, Bob Copeland On Wed, Dec 23, 2009 at 7:27 AM, Denys Vlasenko <vda.linux@googlemail.com> wrote: > On Fri, Dec 18, 2009 at 11:03 PM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: >> On Fri, Dec 18, 2009 at 1:56 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: >>> Hi Luis, >>> >>> On Fri, 18 Dec 2009 09:26:29 -0800 "Luis R. Rodriguez" <mcgrof@gmail.com> wrote: >>>> >>>> I tend to always be on a 2.6.32 kernel + John's queued up patches for >>>> wireless for the next kernel release (I use wireless-testing). My >>>> system is a Thinkpad T61, userspace is Ubuntu 9.10 based (ships with >>>> git 1.6.3.3) and I kept an ext3 filesystem to be able to go back in >>>> time to 2.6.27 at will without issues. I git clone'd linux-next a few >>>> weeks ago. After a few days I then tried to git pull and my system >>>> became completely unusable, It took *ages* to open up a terminal and >>> >>> The start of the daily linux-next boilerplate says: >>> >>>> If you are tracking the linux-next tree using git, you should not use >>>> "git pull" to do so as that will try to merge the new linux-next release >>>> with the old one. You should use "git fetch" as mentioned in the FAQ on >>>> the wiki (see below). >>> >>> (Unfortunately, the wiki seems to be unavailable at the moment) >>> >>> I am guessing that the merge that git is attempting is killing your >>> laptop (though besides the number of common commits I am not sure why). >>> Please try using "get fetch" instead. >> >> Indeed, I learned my lesson now. Thanks for the details. >> >> Now granted, even if 'git merge' is killing my laptop due to the >> conflicts of the insane merge I was trying to do it *still* should not >> make my box completely unresponsive for so long. And given that I'm >> using mostly distribution specific kernel config options and my have >> ruled out my hard drive it seems a general serious kernel issue even >> down to 2.6.27. Whatever git is doing I'm sure other userspace >> software can also end up generating and would make any user go >> completely bananas. I was about to rip my hair out. > > Git gurus would know it by heart, but I am not one. So if I were you, > I would just do a generic diagnostic run. Right its the first thing I did, but its to the extent that even doing that is not possible unless you're willing to wait 5-10 minutes for some output. I'm not kidding. > What is it git is doing > so that machine slows down that much? Is it spawning a lot > of running processes? Doesn't seem like it, the only visible git process is get-merge, I forgot to grep for all git processes though, but I think that was the only one. > Is it allocating/using so much memory > that your box goes into a severe swap storm? Could be, 979M virtual, 298M resident size (non swapped), 58665 shared. Unfortunately when this happens I cannot log into my box and run good diagnostics, that's how much of a pain in the bolas this is. Some morning I had enough patience I did leave vmstat and iostat running and didn't see much out of the ordinary except CPU wait time was pretty high. I did manage to get at least htop running once and took a screenshot (and this took me about 10 minutes to generate): http://bombadil.infradead.org/~mcgrof/images/2009/12/git-merge.jpg So if anything it could be the later, that of a swap storm. What I should have running is sar, that way I can treck back in time when I want to. But even when compiling the kernel my machine becomes unusable for a few seconds when the linking for vmlinux.o starts and in that case my swap usage is about 45 - 125 M. A silly example, pandora reliably poos out on firefox requiring a pkill on firefox to get it back while vmlinux.o is linking. > I guess it is the latter. Only it seems to happen with some other things like compiling the kernel. I'll see if I can upgrade the memory on this thing. > If it is, then it's not a kernel problem - > kernel can't magically make your system adequately handle a workload > which needs 3 GB for working set when the box only has 2 GB of RAM. > It _will_ be very slow. Sure, I'll try to keep my eye out on swap overuse, I suppose it could be that. I started to be suspicious about the CPU freq governor but I'll note on both systems even if I set the freq static to the highest I still had issues. I'll also note on both 2.6.27 and 2.6.32 I used: CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y I started testing CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y but don't really notice an improvement. Luis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git pull on linux-next makes my system crawl to its knees and beg for mercy 2009-12-23 16:20 ` Luis R. Rodriguez @ 2009-12-23 17:11 ` Denys Vlasenko 0 siblings, 0 replies; 11+ messages in thread From: Denys Vlasenko @ 2009-12-23 17:11 UTC (permalink / raw) To: Luis R. Rodriguez; +Cc: Stephen Rothwell, linux-kernel, Bob Copeland On Wed, Dec 23, 2009 at 5:20 PM, Luis R. Rodriguez <mcgrof@gmail.com> wrote: >> Is it allocating/using so much memory >> that your box goes into a severe swap storm? > > Could be, 979M virtual, 298M resident size (non swapped), 58665 shared. > > Unfortunately when this happens I cannot log into my box and run good > diagnostics, that's how much of a pain in the bolas this is. Nicing it may make it easier to do diagnostic work. > Some > morning I had enough patience I did leave vmstat and iostat running > and didn't see much out of the ordinary except CPU wait time was > pretty high. I did manage to get at least htop running once and took a > screenshot (and this took me about 10 minutes to generate): > > http://bombadil.infradead.org/~mcgrof/images/2009/12/git-merge.jpg Looks like swap space is 2/3 used. This is an indication of memory starvation. It may be a residual condition - you have a lot of potentially bloated programs running. What do you see if you reproduce this situation soon after boot, with minimum of other running programs? For one, definitely do not start web browser(s) and such. Ideally, do not run X at all. If you still see a lot of swap used, then this is it - git requires more memory for this task. The possibility that kernel has a bug where it needlessly swaps out is remote. -- vda ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-12-23 17:11 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-18 17:26 git pull on linux-next makes my system crawl to its knees and beg for mercy Luis R. Rodriguez 2009-12-18 17:38 ` Bartlomiej Zolnierkiewicz 2009-12-18 19:19 ` Luis R. Rodriguez 2009-12-18 19:55 ` Luis R. Rodriguez 2009-12-18 20:51 ` Luis R. Rodriguez 2009-12-18 21:13 ` Luis R. Rodriguez 2009-12-18 21:56 ` Stephen Rothwell 2009-12-18 22:03 ` Luis R. Rodriguez 2009-12-23 15:27 ` Denys Vlasenko 2009-12-23 16:20 ` Luis R. Rodriguez 2009-12-23 17:11 ` Denys Vlasenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox