* Re: 2.6.25-rc5-mm1 [not found] ` <64bb37e0803121233v30d12a58i77a1e23fd02ea6f2@mail.gmail.com> @ 2008-03-12 19:44 ` Andrew Morton 2008-03-12 20:01 ` 2.6.25-rc5-mm1 Torsten Kaiser 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2008-03-12 19:44 UTC (permalink / raw) To: Torsten Kaiser; +Cc: linux-kernel, netdev On Wed, 12 Mar 2008 20:33:02 +0100 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote: > On Tue, Mar 11, 2008 at 9:39 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: > > > Quoting Andrew Morton (akpm@linux-foundation.org): > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/ > > I suspect that fewer people are testing linux-next and -mm nowadays. We > > should encourage them to do so, although given the general > > trainwreckishness of current mainline, this isn't really where our effort > > should be expended. > > 2.6.25-rc3-mm1 worked nicely for me, but 2.6.25-rc5-mm1 does not boot. > > dmesg: > [ 0.000000] Linux version 2.6.25-rc5-mm1 (root@treogen) (gcc > version 4.2.3 (Gentoo 4.2.3 p1.0)) #1 SMP Wed Mar 12 19:51:41 CET 2008 > [ 0.000000] Command line: earlyprintk=serial,ttyS0,115200 > console=ttyS0,115200 console=tty1 crypt_root=/dev/md1 sata_nv.swncq=1 So you aren't using netconsole. I had a series of hangs yesterday which went away when netconsole was disabled. I think netconsole is still busted. > ... > > be reserved > [ 1.943656] system 00:0d: iomem range 0xfec00000-0xffffffff could > not be reserved > [ 1.961251] PCI: Bridge: 0000:00:06.0 > [ 1.964921] IO window: disabled. > [ 1.968331] MEM window: 0xeff00000-0xefffffff > [ 1.980903] PREFETCH window: 0x00000000eef00000-0x00000000eeffffff > [ 1.987254] PCI: Bridge: 0000:00:0b.0 > [ 2.000871] IO window: e000-efff > [ 2.004283] MEM window: 0xefe00000-0xefefffff > [ 2.022569] PREFETCH window: disabled. > [ 2.026646] PCI: Bridge: 0000:00:0c.0 > [ 2.030312] IO window: disabled. > [ 2.040486] MEM window: 0xefd00000-0xefdfffff > [ 2.045020] PREFETCH window: disabled. > [ 2.050486] PCI: Bridge: 0000:00:0d.0 > [ 2.062526] IO window: disabled. > [ 2.065935] MEM window: 0xefc00000-0xefcfffff > [ 2.082499] PREFETCH window: disabled. > [ 2.086432] PCI: Bridge: 0000:00:0f.0 > [ 2.092499] IO window: d000-dfff > [ 2.102467] MEM window: 0xefb00000-0xefbfffff > [ 2.107000] PREFETCH window: 0x00000000e0000000-0x00000000e7ffffff > [ 2.122526] NET: Registered protocol family 2 > -> System hang, no reaction to the SysRq keys. > > >From 2.6.25-rc3-mm1: > [ 1.505990] PCI: Bridge: 0000:00:06.0 > [ 1.509654] IO window: disabled. > [ 1.513064] MEM window: 0xeff00000-0xefffffff > [ 1.515228] PREFETCH window: 0x00000000eef00000-0x00000000eeffffff > [ 1.521579] PCI: Bridge: 0000:00:0b.0 > [ 1.525226] IO window: e000-efff > [ 1.528638] MEM window: 0xefe00000-0xefefffff > [ 1.535226] PREFETCH window: disabled. > [ 1.539295] PCI: Bridge: 0000:00:0c.0 > [ 1.542960] IO window: disabled. > [ 1.545226] MEM window: 0xefd00000-0xefdfffff > [ 1.549763] PREFETCH window: disabled. > [ 1.555227] PCI: Bridge: 0000:00:0d.0 > [ 1.558897] IO window: disabled. > [ 1.562304] MEM window: 0xefc00000-0xefcfffff > [ 1.565226] PREFETCH window: disabled. > [ 1.569159] PCI: Bridge: 0000:00:0f.0 > [ 1.575226] IO window: d000-dfff > [ 1.578632] MEM window: 0xefb00000-0xefbfffff > [ 1.583162] PREFETCH window: 0x00000000e0000000-0x00000000e7ffffff > [ 1.585317] NET: Registered protocol family 2 > [ 1.695224] IP route cache hash table entries: 131072 (order: 8, > 1048576 bytes) > [ 1.695224] TCP established hash table entries: 524288 (order: 11, > 8388608 bytes) > [ 1.705988] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) > [ 1.713229] TCP: Hash tables configured (established 524288 bind 65536) OK, so it looks like it died during networking initialisation. Could you please add initcall_debug to the boot command line so we can see which function it is getting stuck in? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.25-rc5-mm1 2008-03-12 19:44 ` 2.6.25-rc5-mm1 Andrew Morton @ 2008-03-12 20:01 ` Torsten Kaiser 2008-03-13 22:05 ` 2.6.25-rc5-mm1 Torsten Kaiser 0 siblings, 1 reply; 24+ messages in thread From: Torsten Kaiser @ 2008-03-12 20:01 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev On Wed, Mar 12, 2008 at 8:44 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 12 Mar 2008 20:33:02 +0100 > "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote: > > dmesg: > > [ 0.000000] Linux version 2.6.25-rc5-mm1 (root@treogen) (gcc > > version 4.2.3 (Gentoo 4.2.3 p1.0)) #1 SMP Wed Mar 12 19:51:41 CET 2008 > > [ 0.000000] Command line: earlyprintk=serial,ttyS0,115200 > > console=ttyS0,115200 console=tty1 crypt_root=/dev/md1 sata_nv.swncq=1 > > So you aren't using netconsole. I had a series of hangs yesterday which > went away when netconsole was disabled. I think netconsole is still > busted. No, just a plain serial console. > > be reserved > > [ 1.943656] system 00:0d: iomem range 0xfec00000-0xffffffff could > > not be reserved > > [ 1.961251] PCI: Bridge: 0000:00:06.0 > > [ 1.964921] IO window: disabled. > > [ 1.968331] MEM window: 0xeff00000-0xefffffff > > [ 1.980903] PREFETCH window: 0x00000000eef00000-0x00000000eeffffff > > [ 1.987254] PCI: Bridge: 0000:00:0b.0 > > [ 2.000871] IO window: e000-efff > > [ 2.004283] MEM window: 0xefe00000-0xefefffff > > [ 2.022569] PREFETCH window: disabled. > > [ 2.026646] PCI: Bridge: 0000:00:0c.0 > > [ 2.030312] IO window: disabled. > > [ 2.040486] MEM window: 0xefd00000-0xefdfffff > > [ 2.045020] PREFETCH window: disabled. > > [ 2.050486] PCI: Bridge: 0000:00:0d.0 > > [ 2.062526] IO window: disabled. > > [ 2.065935] MEM window: 0xefc00000-0xefcfffff > > [ 2.082499] PREFETCH window: disabled. > > [ 2.086432] PCI: Bridge: 0000:00:0f.0 > > [ 2.092499] IO window: d000-dfff > > [ 2.102467] MEM window: 0xefb00000-0xefbfffff > > [ 2.107000] PREFETCH window: 0x00000000e0000000-0x00000000e7ffffff > > [ 2.122526] NET: Registered protocol family 2 > > -> System hang, no reaction to the SysRq keys. [snip] > OK, so it looks like it died during networking initialisation. > > Could you please add initcall_debug to the boot command line so we can see > which function it is getting stuck in? Yes, here is the result: [ 2.573979] PCI-DMA: Disabling AGP. [ 2.577639] PCI-DMA: aperture base @ 8000000 size 65536 KB [ 2.589504] PCI-DMA: using GART IOMMU. [ 2.593258] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture [ 2.600132] initcall pci_iommu_init+0x0/0x20() returned 0 after 19 msecs [ 2.622146] calling hpet_late_init+0x0/0x140() [ 2.626689] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 [ 2.633022] hpet0: 3 32-bit timers, 25000000 Hz [ 2.638562] initcall hpet_late_init+0x0/0x140() returned 0 after 9 msecs [ 2.654545] calling clocksource_done_booting+0x0/0x20() [ 2.659855] initcall clocksource_done_booting+0x0/0x20()<6>Time: hpet clocksource has been installed. [ 2.662185] returned 0 after 0 msecs [ 2.688448] calling init_pipe_fs+0x0/0x60() [ 2.695423] initcall init_pipe_fs+0x0/0x60() returned 0 after 0 msecs [ 2.705784] calling init_mnt_writers+0x0/0x70() [ 2.711681] initcall init_mnt_writers+0x0/0x70() returned 0 after 0 msecs [ 2.721678] calling eventpoll_init+0x0/0x90() [ 2.731644] initcall eventpoll_init+0x0/0x90() returned 0 after 0 msecs [ 2.738295] calling anon_inode_init+0x0/0x130() [ 2.751614] initcall anon_inode_init+0x0/0x130() returned 0 after 0 msecs [ 2.771585] calling pcie_aspm_init+0x0/0x30() [ 2.779297] initcall pcie_aspm_init+0x0/0x30() returned 0 after 2 msecs [ 2.793911] calling acpi_event_init+0x0/0x52() -> it looked like the system this time already hung here. But just pressing the 'Alt' key let the system continue until the network hang. (I tried this a second time, again it paused here until I pressed a key) [ 94.857929] initcall acpi_event_init+0x0/0x52() returned 0 after 29276 msecs [ 94.865002] calling pnp_system_init+0x0/0x20() [ 94.877935] system 00:06: ioport range 0x4d0-0x4d1 has been reserved [ 94.884286] system 00:06: ioport range 0x7b0-0x7df has been reserved [ 94.897886] system 00:06: ioport range 0x800-0x80f has been reserved [ 94.907886] system 00:06: ioport range 0xbb0-0xbdf has been reserved [ 94.917855] system 00:06: ioport range 0x2000-0x207f has been reserved [ 94.937827] system 00:06: ioport range 0x2080-0x20ff has been reserved [ 94.947827] system 00:06: ioport range 0x2400-0x247f has been reserved [ 94.957793] system 00:06: ioport range 0x2480-0x24ff has been reserved [ 94.977766] system 00:06: ioport range 0x2800-0x287f has been reserved [ 94.987766] system 00:06: ioport range 0x2880-0x28ff has been reserved [ 94.997734] system 00:06: ioport range 0x2c00-0x2c7f has been reserved [ 95.017708] system 00:06: ioport range 0x2c80-0x2cff has been reserved [ 95.024234] system 00:06: iomem range 0x0-0x0 could not be reserved [ 95.037678] system 00:06: iomem range 0xfee01000-0xfeefffff could not be reserved [ 95.060158] system 00:06: iomem range 0xefa80000-0xefabffff has been reserved [ 95.070158] system 00:06: iomem range 0xffb00000-0xffbfffff could not be reserved [ 95.077633] system 00:06: iomem range 0xfff00000-0xffffffff could not be reserved [ 95.097590] system 00:08: iomem range 0xfec00000-0xfec00fff could not be reserved [ 95.120064] system 00:08: iomem range 0xfee00000-0xfee00fff could not be reserved [ 95.130070] system 00:0c: ioport range 0x290-0x297 has been reserved [ 95.137523] system 00:0d: iomem range 0x0-0x9ffff could not be reserved [ 95.157495] system 00:0d: iomem range 0xc0000-0xcffff has been reserved [ 95.167495] system 00:0d: iomem range 0xe0000-0xfffff could not be reserved [ 95.177463] system 00:0d: iomem range 0x100000-0xdfffffff could not be reserved [ 95.197437] system 00:0d: iomem range 0xfec00000-0xffffffff could not be reserved [ 95.219983] initcall pnp_system_init+0x0/0x20() returned 0 after 162 msecs [ 95.226887] calling chr_dev_init+0x0/0xd0() [ 95.237813] initcall chr_dev_init+0x0/0xd0() returned 0 after 0 msecs [ 95.247378] calling firmware_class_init+0x0/0x90() [ 95.257320] initcall firmware_class_init+0x0/0x90() returned 0 after 0 msecs [ 95.276031] calling loopback_init+0x0/0x20() [ 95.288590] initcall loopback_init+0x0/0x20() returned 0 after 0 msecs [ 95.296837] calling cpufreq_gov_performance_init+0x0/0x20() [ 95.309084] initcall cpufreq_gov_performance_init+0x0/0x20() returned 0 after 0 msecs [ 95.317734] calling cpufreq_gov_dbs_init+0x0/0x50() [ 95.338090] initcall cpufreq_gov_dbs_init+0x0/0x50() returned 0 after 0 msecs [ 95.345254] calling init_acpi_pm_clocksource+0x0/0xc0() [ 95.355293] initcall init_acpi_pm_clocksource+0x0/0xc0() returned 0 after 0 msecs [ 95.374115] calling pcibios_assign_resources+0x0/0x90() [ 95.379618] PCI: Bridge: 0000:00:06.0 [ 95.394087] IO window: disabled. [ 95.397502] MEM window: 0xeff00000-0xefffffff [ 95.402032] PREFETCH window: 0x00000000eef00000-0x00000000eeffffff [ 95.415189] PCI: Bridge: 0000:00:0b.0 [ 95.418860] IO window: e000-efff [ 95.438075] MEM window: 0xefe00000-0xefefffff [ 95.442610] PREFETCH window: disabled. [ 95.455526] PCI: Bridge: 0000:00:0c.0 [ 95.459197] IO window: disabled. [ 95.462604] MEM window: 0xefd00000-0xefdfffff [ 95.475513] PREFETCH window: disabled. [ 95.479443] PCI: Bridge: 0000:00:0d.0 [ 95.485512] IO window: disabled. [ 95.495480] MEM window: 0xefc00000-0xefcfffff [ 95.500010] PREFETCH window: disabled. [ 95.515455] PCI: Bridge: 0000:00:0f.0 [ 95.519124] IO window: d000-dfff [ 95.522533] MEM window: 0xefb00000-0xefbfffff [ 95.535423] PREFETCH window: 0x00000000e0000000-0x00000000e7ffffff [ 95.545456] initcall pcibios_assign_resources+0x0/0x90() returned 0 after 79 msecs [ 95.557830] calling fill_mp_bus_to_cpumask+0x0/0x100() [ 95.575408] initcall fill_mp_bus_to_cpumask+0x0/0x100() returned 0 after 0 msecs [ 95.593777] calling inet_init+0x0/0x380() [ 95.597890] NET: Registered protocol family 2 -> same hang, no reaction to SysRq. What looks suspicious: The call to pcie_aspm_init is just before the temporary hang. When I used make oldconfig to upgrade the .config from 2.6.25-rc3-mm1 to -rc5-mm1 I activated the new option CONFIG_PCIEASPM. I will try with CONFIG_PCIEASPM_DEBUG added and completely without this option. Torsten ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.25-rc5-mm1 2008-03-12 20:01 ` 2.6.25-rc5-mm1 Torsten Kaiser @ 2008-03-13 22:05 ` Torsten Kaiser 2008-03-13 22:35 ` 2.6.25-rc5-mm1 Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Torsten Kaiser @ 2008-03-13 22:05 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev, Badari Pulavarty On Wed, Mar 12, 2008 at 9:01 PM, Torsten Kaiser <just.for.lkml@googlemail.com> wrote: > On Wed, Mar 12, 2008 at 8:44 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: > > OK, so it looks like it died during networking initialisation. > > > > Could you please add initcall_debug to the boot command line so we can see > > which function it is getting stuck in? > > Yes, here is the result: > [ 2.573979] PCI-DMA: Disabling AGP. > [ 2.577639] PCI-DMA: aperture base @ 8000000 size 65536 KB > [ 2.589504] PCI-DMA: using GART IOMMU. > [ 2.593258] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > [ 2.600132] initcall pci_iommu_init+0x0/0x20() returned 0 after 19 msecs > [ 2.622146] calling hpet_late_init+0x0/0x140() > [ 2.626689] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 > [ 2.633022] hpet0: 3 32-bit timers, 25000000 Hz > [ 2.638562] initcall hpet_late_init+0x0/0x140() returned 0 after 9 msecs > [ 2.654545] calling clocksource_done_booting+0x0/0x20() > [ 2.659855] initcall clocksource_done_booting+0x0/0x20()<6>Time: > > hpet clocksource has been installed. > [ 2.662185] returned 0 after 0 msecs > [ 2.688448] calling init_pipe_fs+0x0/0x60() > [ 2.695423] initcall init_pipe_fs+0x0/0x60() returned 0 after 0 msecs > [ 2.705784] calling init_mnt_writers+0x0/0x70() > [ 2.711681] initcall init_mnt_writers+0x0/0x70() returned 0 after 0 msecs > [ 2.721678] calling eventpoll_init+0x0/0x90() > [ 2.731644] initcall eventpoll_init+0x0/0x90() returned 0 after 0 msecs > [ 2.738295] calling anon_inode_init+0x0/0x130() > [ 2.751614] initcall anon_inode_init+0x0/0x130() returned 0 after 0 msecs > [ 2.771585] calling pcie_aspm_init+0x0/0x30() > [ 2.779297] initcall pcie_aspm_init+0x0/0x30() returned 0 after 2 msecs > [ 2.793911] calling acpi_event_init+0x0/0x52() > > -> it looked like the system this time already hung here. But just > pressing the 'Alt' key let the system continue until the network hang. > (I tried this a second time, again it paused here until I pressed a key) > > [ 94.857929] initcall acpi_event_init+0x0/0x52() returned 0 after 29276 msecs > [ 94.865002] calling pnp_system_init+0x0/0x20() > [ 94.877935] system 00:06: ioport range 0x4d0-0x4d1 has been reserved > [ 94.884286] system 00:06: ioport range 0x7b0-0x7df has been reserved > [ 94.897886] system 00:06: ioport range 0x800-0x80f has been reserved > [ 94.907886] system 00:06: ioport range 0xbb0-0xbdf has been reserved > [ 94.917855] system 00:06: ioport range 0x2000-0x207f has been reserved > [ 94.937827] system 00:06: ioport range 0x2080-0x20ff has been reserved > [ 94.947827] system 00:06: ioport range 0x2400-0x247f has been reserved > [ 94.957793] system 00:06: ioport range 0x2480-0x24ff has been reserved > [ 94.977766] system 00:06: ioport range 0x2800-0x287f has been reserved > [ 94.987766] system 00:06: ioport range 0x2880-0x28ff has been reserved > [ 94.997734] system 00:06: ioport range 0x2c00-0x2c7f has been reserved > [ 95.017708] system 00:06: ioport range 0x2c80-0x2cff has been reserved > [ 95.024234] system 00:06: iomem range 0x0-0x0 could not be reserved > [ 95.037678] system 00:06: iomem range 0xfee01000-0xfeefffff could > not be reserved > [ 95.060158] system 00:06: iomem range 0xefa80000-0xefabffff has been reserved > [ 95.070158] system 00:06: iomem range 0xffb00000-0xffbfffff could > not be reserved > [ 95.077633] system 00:06: iomem range 0xfff00000-0xffffffff could > not be reserved > [ 95.097590] system 00:08: iomem range 0xfec00000-0xfec00fff could > not be reserved > [ 95.120064] system 00:08: iomem range 0xfee00000-0xfee00fff could > not be reserved > [ 95.130070] system 00:0c: ioport range 0x290-0x297 has been reserved > [ 95.137523] system 00:0d: iomem range 0x0-0x9ffff could not be reserved > [ 95.157495] system 00:0d: iomem range 0xc0000-0xcffff has been reserved > [ 95.167495] system 00:0d: iomem range 0xe0000-0xfffff could not be reserved > [ 95.177463] system 00:0d: iomem range 0x100000-0xdfffffff could not > be reserved > [ 95.197437] system 00:0d: iomem range 0xfec00000-0xffffffff could > not be reserved > [ 95.219983] initcall pnp_system_init+0x0/0x20() returned 0 after 162 msecs > [ 95.226887] calling chr_dev_init+0x0/0xd0() > [ 95.237813] initcall chr_dev_init+0x0/0xd0() returned 0 after 0 msecs > [ 95.247378] calling firmware_class_init+0x0/0x90() > [ 95.257320] initcall firmware_class_init+0x0/0x90() returned 0 after 0 msecs > [ 95.276031] calling loopback_init+0x0/0x20() > [ 95.288590] initcall loopback_init+0x0/0x20() returned 0 after 0 msecs > [ 95.296837] calling cpufreq_gov_performance_init+0x0/0x20() > [ 95.309084] initcall cpufreq_gov_performance_init+0x0/0x20() > returned 0 after 0 msecs > [ 95.317734] calling cpufreq_gov_dbs_init+0x0/0x50() > [ 95.338090] initcall cpufreq_gov_dbs_init+0x0/0x50() returned 0 after 0 msecs > [ 95.345254] calling init_acpi_pm_clocksource+0x0/0xc0() > [ 95.355293] initcall init_acpi_pm_clocksource+0x0/0xc0() returned 0 > after 0 msecs > [ 95.374115] calling pcibios_assign_resources+0x0/0x90() > [ 95.379618] PCI: Bridge: 0000:00:06.0 > [ 95.394087] IO window: disabled. > [ 95.397502] MEM window: 0xeff00000-0xefffffff > [ 95.402032] PREFETCH window: 0x00000000eef00000-0x00000000eeffffff > [ 95.415189] PCI: Bridge: 0000:00:0b.0 > [ 95.418860] IO window: e000-efff > [ 95.438075] MEM window: 0xefe00000-0xefefffff > [ 95.442610] PREFETCH window: disabled. > [ 95.455526] PCI: Bridge: 0000:00:0c.0 > [ 95.459197] IO window: disabled. > [ 95.462604] MEM window: 0xefd00000-0xefdfffff > [ 95.475513] PREFETCH window: disabled. > [ 95.479443] PCI: Bridge: 0000:00:0d.0 > [ 95.485512] IO window: disabled. > [ 95.495480] MEM window: 0xefc00000-0xefcfffff > [ 95.500010] PREFETCH window: disabled. > [ 95.515455] PCI: Bridge: 0000:00:0f.0 > [ 95.519124] IO window: d000-dfff > [ 95.522533] MEM window: 0xefb00000-0xefbfffff > [ 95.535423] PREFETCH window: 0x00000000e0000000-0x00000000e7ffffff > [ 95.545456] initcall pcibios_assign_resources+0x0/0x90() returned 0 > after 79 msecs > [ 95.557830] calling fill_mp_bus_to_cpumask+0x0/0x100() > [ 95.575408] initcall fill_mp_bus_to_cpumask+0x0/0x100() returned 0 > after 0 msecs > [ 95.593777] calling inet_init+0x0/0x380() > [ 95.597890] NET: Registered protocol family 2 > > -> same hang, no reaction to SysRq. > > What looks suspicious: The call to pcie_aspm_init is just before the > temporary hang. When I used make oldconfig to upgrade the .config from > 2.6.25-rc3-mm1 to -rc5-mm1 I activated the new option CONFIG_PCIEASPM. > > I will try with CONFIG_PCIEASPM_DEBUG added and completely without this option. CONFIG_PCIEASPM does not change anything. Also testing the range of ipc patches you suggested to Badari did not fix it. I did a bisect, these patches are currently remaining, but I dod not have the time for more bisect steps until tomorrow: git-scsi-misc git-sh execute-tasklets-in-the-same-order-they-were-queued git-sched sched: work around hrtick related lockup sched: make sure jiffies is up to date before calling __update_rq_clock() sched: fix rq->clock overflows detection with CONFIG_NO_HZ sched: make cpu_clock() globally synchronous sched: remove isolcpus ftrace: make the task state char-string visible to all sched: add latency tracer callbacks to the scheduler latencytop: optimize LT_BACKTRACEDEPTH loops a bit sched: cleanup old and rarely used 'debug' features. [SCSI] zfcp: convert zfcp to use target reset and device reset handler [SCSI] qla4xxx: Add target reset functionality [SCSI] scsi_error: add target reset handler [SCSI] ps3rom: Simplify fill_from_dev_buffer() [SCSI] scsi_debug: use shost_priv macro [SCSI] scsi_debug: remove unnecessary checking [SCSI] scsi_debug: remove scsi_debug.h [SCSI] scsi_debug: stop including drivers/scsi/scsi.h [SCSI] Remove random noop unchecked_isa_dma users [SCSI] aacraid: READ_CAPACITY_16 shouldn't trust allocation length in cdb [SCSI] st: show options currently set in sysfs [SCSI] st: add option to use SILI in variable block reads [SCSI] gdth: remove command accessors [SCSI] aic94xx: Use sas_request_addr() to provide SAS WWN if the adapter lacks one [SCSI] libsas: Provide a transport-level facility to request SAS addrs [SCSI] ips: sg chaining support to the path to non I/O commands [SCSI] gdth: convert to PCI hotplug API [SCSI] gdth: PCI probe cleanups, prep for PCI hotplug API conversion rtc: rtc-sh: Add support for periodic IRQs. sh: SuperH KEYSC keypad data for Solution Engine 7722 sh: SuperH KEYSC keypad data for MigoR sh: SuperH KEYSC platform driver Torsten ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.25-rc5-mm1 2008-03-13 22:05 ` 2.6.25-rc5-mm1 Torsten Kaiser @ 2008-03-13 22:35 ` Andrew Morton 2008-03-13 23:10 ` 2.6.25-rc5-mm1 Badari Pulavarty 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2008-03-13 22:35 UTC (permalink / raw) To: Torsten Kaiser; +Cc: linux-kernel, netdev, pbadari On Thu, 13 Mar 2008 23:05:11 +0100 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote: > On Wed, Mar 12, 2008 at 9:01 PM, Torsten Kaiser > <just.for.lkml@googlemail.com> wrote: > > On Wed, Mar 12, 2008 at 8:44 PM, Andrew Morton > > <akpm@linux-foundation.org> wrote: > > > OK, so it looks like it died during networking initialisation. > > > > > > Could you please add initcall_debug to the boot command line so we can see > > > which function it is getting stuck in? > > > > Yes, here is the result: > > [ 2.573979] PCI-DMA: Disabling AGP. > > [ 2.577639] PCI-DMA: aperture base @ 8000000 size 65536 KB > > [ 2.589504] PCI-DMA: using GART IOMMU. > > [ 2.593258] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > > [ 2.600132] initcall pci_iommu_init+0x0/0x20() returned 0 after 19 msecs > > [ 2.622146] calling hpet_late_init+0x0/0x140() > > [ 2.626689] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 > > [ 2.633022] hpet0: 3 32-bit timers, 25000000 Hz > > [ 2.638562] initcall hpet_late_init+0x0/0x140() returned 0 after 9 msecs > > [ 2.654545] calling clocksource_done_booting+0x0/0x20() > > [ 2.659855] initcall clocksource_done_booting+0x0/0x20()<6>Time: > > > > hpet clocksource has been installed. > > [ 2.662185] returned 0 after 0 msecs > > [ 2.688448] calling init_pipe_fs+0x0/0x60() > > [ 2.695423] initcall init_pipe_fs+0x0/0x60() returned 0 after 0 msecs > > [ 2.705784] calling init_mnt_writers+0x0/0x70() > > [ 2.711681] initcall init_mnt_writers+0x0/0x70() returned 0 after 0 msecs > > [ 2.721678] calling eventpoll_init+0x0/0x90() > > [ 2.731644] initcall eventpoll_init+0x0/0x90() returned 0 after 0 msecs > > [ 2.738295] calling anon_inode_init+0x0/0x130() > > [ 2.751614] initcall anon_inode_init+0x0/0x130() returned 0 after 0 msecs > > [ 2.771585] calling pcie_aspm_init+0x0/0x30() > > [ 2.779297] initcall pcie_aspm_init+0x0/0x30() returned 0 after 2 msecs > > [ 2.793911] calling acpi_event_init+0x0/0x52() > > > > -> it looked like the system this time already hung here. But just > > pressing the 'Alt' key let the system continue until the network hang. > > (I tried this a second time, again it paused here until I pressed a key) > > > > [ 94.857929] initcall acpi_event_init+0x0/0x52() returned 0 after 29276 msecs > > [ 94.865002] calling pnp_system_init+0x0/0x20() > > [ 94.877935] system 00:06: ioport range 0x4d0-0x4d1 has been reserved > > [ 94.884286] system 00:06: ioport range 0x7b0-0x7df has been reserved > > [ 94.897886] system 00:06: ioport range 0x800-0x80f has been reserved > > [ 94.907886] system 00:06: ioport range 0xbb0-0xbdf has been reserved > > [ 94.917855] system 00:06: ioport range 0x2000-0x207f has been reserved > > [ 94.937827] system 00:06: ioport range 0x2080-0x20ff has been reserved > > [ 94.947827] system 00:06: ioport range 0x2400-0x247f has been reserved > > [ 94.957793] system 00:06: ioport range 0x2480-0x24ff has been reserved > > [ 94.977766] system 00:06: ioport range 0x2800-0x287f has been reserved > > [ 94.987766] system 00:06: ioport range 0x2880-0x28ff has been reserved > > [ 94.997734] system 00:06: ioport range 0x2c00-0x2c7f has been reserved > > [ 95.017708] system 00:06: ioport range 0x2c80-0x2cff has been reserved > > [ 95.024234] system 00:06: iomem range 0x0-0x0 could not be reserved > > [ 95.037678] system 00:06: iomem range 0xfee01000-0xfeefffff could > > not be reserved > > [ 95.060158] system 00:06: iomem range 0xefa80000-0xefabffff has been reserved > > [ 95.070158] system 00:06: iomem range 0xffb00000-0xffbfffff could > > not be reserved > > [ 95.077633] system 00:06: iomem range 0xfff00000-0xffffffff could > > not be reserved > > [ 95.097590] system 00:08: iomem range 0xfec00000-0xfec00fff could > > not be reserved > > [ 95.120064] system 00:08: iomem range 0xfee00000-0xfee00fff could > > not be reserved > > [ 95.130070] system 00:0c: ioport range 0x290-0x297 has been reserved > > [ 95.137523] system 00:0d: iomem range 0x0-0x9ffff could not be reserved > > [ 95.157495] system 00:0d: iomem range 0xc0000-0xcffff has been reserved > > [ 95.167495] system 00:0d: iomem range 0xe0000-0xfffff could not be reserved > > [ 95.177463] system 00:0d: iomem range 0x100000-0xdfffffff could not > > be reserved > > [ 95.197437] system 00:0d: iomem range 0xfec00000-0xffffffff could > > not be reserved > > [ 95.219983] initcall pnp_system_init+0x0/0x20() returned 0 after 162 msecs > > [ 95.226887] calling chr_dev_init+0x0/0xd0() > > [ 95.237813] initcall chr_dev_init+0x0/0xd0() returned 0 after 0 msecs > > [ 95.247378] calling firmware_class_init+0x0/0x90() > > [ 95.257320] initcall firmware_class_init+0x0/0x90() returned 0 after 0 msecs > > [ 95.276031] calling loopback_init+0x0/0x20() > > [ 95.288590] initcall loopback_init+0x0/0x20() returned 0 after 0 msecs > > [ 95.296837] calling cpufreq_gov_performance_init+0x0/0x20() > > [ 95.309084] initcall cpufreq_gov_performance_init+0x0/0x20() > > returned 0 after 0 msecs > > [ 95.317734] calling cpufreq_gov_dbs_init+0x0/0x50() > > [ 95.338090] initcall cpufreq_gov_dbs_init+0x0/0x50() returned 0 after 0 msecs > > [ 95.345254] calling init_acpi_pm_clocksource+0x0/0xc0() > > [ 95.355293] initcall init_acpi_pm_clocksource+0x0/0xc0() returned 0 > > after 0 msecs > > [ 95.374115] calling pcibios_assign_resources+0x0/0x90() > > [ 95.379618] PCI: Bridge: 0000:00:06.0 > > [ 95.394087] IO window: disabled. > > [ 95.397502] MEM window: 0xeff00000-0xefffffff > > [ 95.402032] PREFETCH window: 0x00000000eef00000-0x00000000eeffffff > > [ 95.415189] PCI: Bridge: 0000:00:0b.0 > > [ 95.418860] IO window: e000-efff > > [ 95.438075] MEM window: 0xefe00000-0xefefffff > > [ 95.442610] PREFETCH window: disabled. > > [ 95.455526] PCI: Bridge: 0000:00:0c.0 > > [ 95.459197] IO window: disabled. > > [ 95.462604] MEM window: 0xefd00000-0xefdfffff > > [ 95.475513] PREFETCH window: disabled. > > [ 95.479443] PCI: Bridge: 0000:00:0d.0 > > [ 95.485512] IO window: disabled. > > [ 95.495480] MEM window: 0xefc00000-0xefcfffff > > [ 95.500010] PREFETCH window: disabled. > > [ 95.515455] PCI: Bridge: 0000:00:0f.0 > > [ 95.519124] IO window: d000-dfff > > [ 95.522533] MEM window: 0xefb00000-0xefbfffff > > [ 95.535423] PREFETCH window: 0x00000000e0000000-0x00000000e7ffffff > > [ 95.545456] initcall pcibios_assign_resources+0x0/0x90() returned 0 > > after 79 msecs > > [ 95.557830] calling fill_mp_bus_to_cpumask+0x0/0x100() > > [ 95.575408] initcall fill_mp_bus_to_cpumask+0x0/0x100() returned 0 > > after 0 msecs > > [ 95.593777] calling inet_init+0x0/0x380() > > [ 95.597890] NET: Registered protocol family 2 > > > > -> same hang, no reaction to SysRq. > > > > What looks suspicious: The call to pcie_aspm_init is just before the > > temporary hang. When I used make oldconfig to upgrade the .config from > > 2.6.25-rc3-mm1 to -rc5-mm1 I activated the new option CONFIG_PCIEASPM. > > > > I will try with CONFIG_PCIEASPM_DEBUG added and completely without this option. > > CONFIG_PCIEASPM does not change anything. > Also testing the range of ipc patches you suggested to Badari did not fix it. > > I did a bisect, these patches are currently remaining, but I dod not > have the time for more bisect steps until tomorrow: OK, thanks for persisting. > git-scsi-misc > git-sh > execute-tasklets-in-the-same-order-they-were-queued > git-sched > sched: work around hrtick related lockup > sched: make sure jiffies is up to date before calling __update_rq_clock() > sched: fix rq->clock overflows detection with CONFIG_NO_HZ > sched: make cpu_clock() globally synchronous > sched: remove isolcpus > ftrace: make the task state char-string visible to all > sched: add latency tracer callbacks to the scheduler the sched patches, perhaps.. > latencytop: optimize LT_BACKTRACEDEPTH loops a bit > sched: cleanup old and rarely used 'debug' features. > [SCSI] zfcp: convert zfcp to use target reset and device reset handler > [SCSI] qla4xxx: Add target reset functionality > [SCSI] scsi_error: add target reset handler > [SCSI] ps3rom: Simplify fill_from_dev_buffer() > [SCSI] scsi_debug: use shost_priv macro > [SCSI] scsi_debug: remove unnecessary checking > [SCSI] scsi_debug: remove scsi_debug.h > [SCSI] scsi_debug: stop including drivers/scsi/scsi.h > [SCSI] Remove random noop unchecked_isa_dma users > [SCSI] aacraid: READ_CAPACITY_16 shouldn't trust allocation length in cdb > [SCSI] st: show options currently set in sysfs > [SCSI] st: add option to use SILI in variable block reads > [SCSI] gdth: remove command accessors > [SCSI] aic94xx: Use sas_request_addr() to provide SAS WWN if the > adapter lacks one > [SCSI] libsas: Provide a transport-level facility to request SAS addrs > [SCSI] ips: sg chaining support to the path to non I/O commands > [SCSI] gdth: convert to PCI hotplug API > [SCSI] gdth: PCI probe cleanups, prep for PCI hotplug API conversion > rtc: rtc-sh: Add support for periodic IRQs. > sh: SuperH KEYSC keypad data for Solution Engine 7722 > sh: SuperH KEYSC keypad data for MigoR > sh: SuperH KEYSC platform driver > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.25-rc5-mm1 2008-03-13 22:35 ` 2.6.25-rc5-mm1 Andrew Morton @ 2008-03-13 23:10 ` Badari Pulavarty 2008-03-21 12:12 ` 2.6.25-rc5-mm1 Ingo Molnar 0 siblings, 1 reply; 24+ messages in thread From: Badari Pulavarty @ 2008-03-13 23:10 UTC (permalink / raw) To: Andrew Morton; +Cc: Torsten Kaiser, lkml, netdev On Thu, 2008-03-13 at 15:35 -0700, Andrew Morton wrote: > On Thu, 13 Mar 2008 23:05:11 +0100 > "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote: > > > On Wed, Mar 12, 2008 at 9:01 PM, Torsten Kaiser > > <just.for.lkml@googlemail.com> wrote: > > > On Wed, Mar 12, 2008 at 8:44 PM, Andrew Morton > > > <akpm@linux-foundation.org> wrote: > > > > OK, so it looks like it died during networking initialisation. > > > > > > > > Could you please add initcall_debug to the boot command line so we can see > > > > which function it is getting stuck in? > > > > > > Yes, here is the result: > > > [ 2.573979] PCI-DMA: Disabling AGP. > > > [ 2.577639] PCI-DMA: aperture base @ 8000000 size 65536 KB > > > [ 2.589504] PCI-DMA: using GART IOMMU. > > > [ 2.593258] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > > > [ 2.600132] initcall pci_iommu_init+0x0/0x20() returned 0 after 19 msecs > > > [ 2.622146] calling hpet_late_init+0x0/0x140() > > > [ 2.626689] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 > > > [ 2.633022] hpet0: 3 32-bit timers, 25000000 Hz > > > [ 2.638562] initcall hpet_late_init+0x0/0x140() returned 0 after 9 msecs > > > [ 2.654545] calling clocksource_done_booting+0x0/0x20() > > > [ 2.659855] initcall clocksource_done_booting+0x0/0x20()<6>Time: > > > > > > hpet clocksource has been installed. > > > [ 2.662185] returned 0 after 0 msecs > > > [ 2.688448] calling init_pipe_fs+0x0/0x60() > > > [ 2.695423] initcall init_pipe_fs+0x0/0x60() returned 0 after 0 msecs > > > [ 2.705784] calling init_mnt_writers+0x0/0x70() > > > [ 2.711681] initcall init_mnt_writers+0x0/0x70() returned 0 after 0 msecs > > > [ 2.721678] calling eventpoll_init+0x0/0x90() > > > [ 2.731644] initcall eventpoll_init+0x0/0x90() returned 0 after 0 msecs > > > [ 2.738295] calling anon_inode_init+0x0/0x130() > > > [ 2.751614] initcall anon_inode_init+0x0/0x130() returned 0 after 0 msecs > > > [ 2.771585] calling pcie_aspm_init+0x0/0x30() > > > [ 2.779297] initcall pcie_aspm_init+0x0/0x30() returned 0 after 2 msecs > > > [ 2.793911] calling acpi_event_init+0x0/0x52() > > > > > > -> it looked like the system this time already hung here. But just > > > pressing the 'Alt' key let the system continue until the network hang. > > > (I tried this a second time, again it paused here until I pressed a key) > > > > > > [ 94.857929] initcall acpi_event_init+0x0/0x52() returned 0 after 29276 msecs > > > [ 94.865002] calling pnp_system_init+0x0/0x20() > > > [ 94.877935] system 00:06: ioport range 0x4d0-0x4d1 has been reserved > > > [ 94.884286] system 00:06: ioport range 0x7b0-0x7df has been reserved > > > [ 94.897886] system 00:06: ioport range 0x800-0x80f has been reserved > > > [ 94.907886] system 00:06: ioport range 0xbb0-0xbdf has been reserved > > > [ 94.917855] system 00:06: ioport range 0x2000-0x207f has been reserved > > > [ 94.937827] system 00:06: ioport range 0x2080-0x20ff has been reserved > > > [ 94.947827] system 00:06: ioport range 0x2400-0x247f has been reserved > > > [ 94.957793] system 00:06: ioport range 0x2480-0x24ff has been reserved > > > [ 94.977766] system 00:06: ioport range 0x2800-0x287f has been reserved > > > [ 94.987766] system 00:06: ioport range 0x2880-0x28ff has been reserved > > > [ 94.997734] system 00:06: ioport range 0x2c00-0x2c7f has been reserved > > > [ 95.017708] system 00:06: ioport range 0x2c80-0x2cff has been reserved > > > [ 95.024234] system 00:06: iomem range 0x0-0x0 could not be reserved > > > [ 95.037678] system 00:06: iomem range 0xfee01000-0xfeefffff could > > > not be reserved > > > [ 95.060158] system 00:06: iomem range 0xefa80000-0xefabffff has been reserved > > > [ 95.070158] system 00:06: iomem range 0xffb00000-0xffbfffff could > > > not be reserved > > > [ 95.077633] system 00:06: iomem range 0xfff00000-0xffffffff could > > > not be reserved > > > [ 95.097590] system 00:08: iomem range 0xfec00000-0xfec00fff could > > > not be reserved > > > [ 95.120064] system 00:08: iomem range 0xfee00000-0xfee00fff could > > > not be reserved > > > [ 95.130070] system 00:0c: ioport range 0x290-0x297 has been reserved > > > [ 95.137523] system 00:0d: iomem range 0x0-0x9ffff could not be reserved > > > [ 95.157495] system 00:0d: iomem range 0xc0000-0xcffff has been reserved > > > [ 95.167495] system 00:0d: iomem range 0xe0000-0xfffff could not be reserved > > > [ 95.177463] system 00:0d: iomem range 0x100000-0xdfffffff could not > > > be reserved > > > [ 95.197437] system 00:0d: iomem range 0xfec00000-0xffffffff could > > > not be reserved > > > [ 95.219983] initcall pnp_system_init+0x0/0x20() returned 0 after 162 msecs > > > [ 95.226887] calling chr_dev_init+0x0/0xd0() > > > [ 95.237813] initcall chr_dev_init+0x0/0xd0() returned 0 after 0 msecs > > > [ 95.247378] calling firmware_class_init+0x0/0x90() > > > [ 95.257320] initcall firmware_class_init+0x0/0x90() returned 0 after 0 msecs > > > [ 95.276031] calling loopback_init+0x0/0x20() > > > [ 95.288590] initcall loopback_init+0x0/0x20() returned 0 after 0 msecs > > > [ 95.296837] calling cpufreq_gov_performance_init+0x0/0x20() > > > [ 95.309084] initcall cpufreq_gov_performance_init+0x0/0x20() > > > returned 0 after 0 msecs > > > [ 95.317734] calling cpufreq_gov_dbs_init+0x0/0x50() > > > [ 95.338090] initcall cpufreq_gov_dbs_init+0x0/0x50() returned 0 after 0 msecs > > > [ 95.345254] calling init_acpi_pm_clocksource+0x0/0xc0() > > > [ 95.355293] initcall init_acpi_pm_clocksource+0x0/0xc0() returned 0 > > > after 0 msecs > > > [ 95.374115] calling pcibios_assign_resources+0x0/0x90() > > > [ 95.379618] PCI: Bridge: 0000:00:06.0 > > > [ 95.394087] IO window: disabled. > > > [ 95.397502] MEM window: 0xeff00000-0xefffffff > > > [ 95.402032] PREFETCH window: 0x00000000eef00000-0x00000000eeffffff > > > [ 95.415189] PCI: Bridge: 0000:00:0b.0 > > > [ 95.418860] IO window: e000-efff > > > [ 95.438075] MEM window: 0xefe00000-0xefefffff > > > [ 95.442610] PREFETCH window: disabled. > > > [ 95.455526] PCI: Bridge: 0000:00:0c.0 > > > [ 95.459197] IO window: disabled. > > > [ 95.462604] MEM window: 0xefd00000-0xefdfffff > > > [ 95.475513] PREFETCH window: disabled. > > > [ 95.479443] PCI: Bridge: 0000:00:0d.0 > > > [ 95.485512] IO window: disabled. > > > [ 95.495480] MEM window: 0xefc00000-0xefcfffff > > > [ 95.500010] PREFETCH window: disabled. > > > [ 95.515455] PCI: Bridge: 0000:00:0f.0 > > > [ 95.519124] IO window: d000-dfff > > > [ 95.522533] MEM window: 0xefb00000-0xefbfffff > > > [ 95.535423] PREFETCH window: 0x00000000e0000000-0x00000000e7ffffff > > > [ 95.545456] initcall pcibios_assign_resources+0x0/0x90() returned 0 > > > after 79 msecs > > > [ 95.557830] calling fill_mp_bus_to_cpumask+0x0/0x100() > > > [ 95.575408] initcall fill_mp_bus_to_cpumask+0x0/0x100() returned 0 > > > after 0 msecs > > > [ 95.593777] calling inet_init+0x0/0x380() > > > [ 95.597890] NET: Registered protocol family 2 > > > > > > -> same hang, no reaction to SysRq. > > > > > > What looks suspicious: The call to pcie_aspm_init is just before the > > > temporary hang. When I used make oldconfig to upgrade the .config from > > > 2.6.25-rc3-mm1 to -rc5-mm1 I activated the new option CONFIG_PCIEASPM. > > > > > > I will try with CONFIG_PCIEASPM_DEBUG added and completely without this option. > > > > CONFIG_PCIEASPM does not change anything. > > Also testing the range of ipc patches you suggested to Badari did not fix it. > > > > I did a bisect, these patches are currently remaining, but I dod not > > have the time for more bisect steps until tomorrow: > > OK, thanks for persisting. > > > git-scsi-misc > > git-sh > > execute-tasklets-in-the-same-order-they-were-queued > > git-sched > > sched: work around hrtick related lockup > > sched: make sure jiffies is up to date before calling __update_rq_clock() > > sched: fix rq->clock overflows detection with CONFIG_NO_HZ > > sched: make cpu_clock() globally synchronous > > sched: remove isolcpus > > ftrace: make the task state char-string visible to all > > sched: add latency tracer callbacks to the scheduler Yes. I found the following patch to be the culprit. sched: make sure jiffies is up to date before calling __update_rq_clock () Torsten, looking at your output, it looks like it hung at the same place. Backing out this patch should help. Try it out. I am sure you also have CONFIG_DETECT_SOFTLOCKUP=y in your config ? commit 60befbc1c0b6d141c9c26e61ddd303aedd1e7396 Author: Guillaume Chazarain <guichaz@yahoo.fr> Date: Mon Mar 10 08:16:41 2008 +0100 sched: make sure jiffies is up to date before calling __update_rq_clock() Now that __update_rq_clock() uses jiffies to detect clock overflows, make sure jiffies are up to date before touch_softlockup_watchdog(). Removed a touch_softlockup_watchdog() call becoming redundant with the added tick_nohz_update_jiffies(). Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/kernel/sched.c b/kernel/sched.c --- a/kernel/sched.c +++ b/kernel/sched.c @@ -66,6 +66,7 @@ #include <linux/unistd.h> #include <linux/pagemap.h> #include <linux/hrtimer.h> +#include <linux/tick.h> #include <asm/tlb.h> #include <asm/irq_regs.h> @@ -913,7 +914,7 @@ void sched_clock_idle_wakeup_event(u64 delta_ns) rq->prev_clock_raw = now; rq->clock += delta_ns; spin_unlock(&rq->lock); - touch_softlockup_watchdog(); + tick_nohz_update_jiffies(); } EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event); Thanks, Badari ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.25-rc5-mm1 2008-03-13 23:10 ` 2.6.25-rc5-mm1 Badari Pulavarty @ 2008-03-21 12:12 ` Ingo Molnar 0 siblings, 0 replies; 24+ messages in thread From: Ingo Molnar @ 2008-03-21 12:12 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Andrew Morton, Torsten Kaiser, lkml, netdev * Badari Pulavarty <pbadari@gmail.com> wrote: > Yes. I found the following patch to be the culprit. > > sched: make sure jiffies is up to date before calling __update_rq_clock > () thanks Badari, i've backed out this patch. Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
* [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root [not found] <20080311011434.ad8c8d7d.akpm@linux-foundation.org> [not found] ` <20080311202300.GA8957@vino.hallyn.com> @ 2008-03-13 19:48 ` Tilman Schmidt 2008-03-13 22:21 ` Daniel Lezcano 2008-03-19 17:52 ` Benjamin Thery 1 sibling, 2 replies; 24+ messages in thread From: Tilman Schmidt @ 2008-03-13 19:48 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev, David Miller, pekkas, yoshfuji [-- Attachment #1: Type: text/plain, Size: 3391 bytes --] Am 11.03.2008 09:14 schrieb Andrew Morton: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/ I'm noticing a strange effect with this: On my openSUSE 10.3 development machine with SUSEs default MTA Postfix installed, I occasionally send a pre-formatted mail by feeding it directly into "/usr/sbin/sendmail -t". If I try that while running a 2.6.25-rc5-mm1 kernel, I get: ts@xenon:~/kernel> /usr/sbin/sendmail -t < patch-usb-reduce-syslog-clutter-v3 postdrop: warning: can't open /proc/net/if_inet6 (Permission denied) - skipping IPv6 configuration postdrop: fatal: parameter inet_interfaces: no local interface found for ::1 sendmail: warning: command "/usr/sbin/postdrop -r" exited with status 1 sendmail: fatal: ts(1000): unable to execute /usr/sbin/postdrop -r: Success ts@xenon:~/kernel> and unsurprisingly, the mail is not sent. If I do the same as root, everything works as usual, there is no console output from the sendmail command, and the mail goes out as it should. All other networking applications appear to be running normally. On a 2.6.25-rc5 (non-mm) kernel I do not need to run the sendmail command as root. It works just as well if I run it as myself. IPv6 is not in use on that machine. The Ethernet interface has just the link local IPv6 address. Possibly relevant information: ts@xenon:~> /sbin/ifconfig -a eth0 Protokoll:Ethernet Hardware Adresse 00:19:D1:03:D8:FF inet Adresse:192.168.59.102 Bcast:192.168.59.255 Maske:255.255.255.0 inet6 Adresse: fe80::219:d1ff:fe03:d8ff/64 Gültigkeitsbereich:Verbindung UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:78 errors:0 dropped:0 overruns:0 frame:0 TX packets:145 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 Sendewarteschlangenlänge:100 RX bytes:9547 (9.3 Kb) TX bytes:17952 (17.5 Kb) Speicher:92c00000-92c20000 lo Protokoll:Lokale Schleife inet Adresse:127.0.0.1 Maske:255.0.0.0 inet6 Adresse: ::1/128 Gültigkeitsbereich:Maschine UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 Sendewarteschlangenlänge:0 RX bytes:100 (100.0 b) TX bytes:100 (100.0 b) ts@xenon:~/kernel> ls -l /proc/net/if_inet6 -r--r--r-- 1 root root 0 13. Mär 19:26 /proc/net/if_inet6 ts@xenon:~> cat /proc/net/if_inet6 fe800000000000000219d1fffe03d8ff 02 40 20 80 eth0 00000000000000000000000000000001 01 80 10 80 lo ts@xenon:~> uname -a Linux xenon 2.6.25-rc5-mm1-testing #1 SMP PREEMPT Tue Mar 11 14:34:49 CET 2008 i686 i686 i386 GNU/Linux As you see, I can cat /proc/net/if_inet6 as regular (non-root) user just fine, even though Postfix complains it cannot access it. The content of /proc/net/if_inet6 is identical if I cat it on kernel 2.6.25-rc5 mainline. CCing a selection of IPv6 networking related maintainer addresses. If you need more information or want me to test something, let me know. HTH T. -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 253 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-13 19:48 ` [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root Tilman Schmidt @ 2008-03-13 22:21 ` Daniel Lezcano 2008-03-14 0:08 ` Tilman Schmidt 2008-03-19 17:52 ` Benjamin Thery 1 sibling, 1 reply; 24+ messages in thread From: Daniel Lezcano @ 2008-03-13 22:21 UTC (permalink / raw) To: Tilman Schmidt Cc: Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji Tilman Schmidt wrote: > Am 11.03.2008 09:14 schrieb Andrew Morton: >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/ > > I'm noticing a strange effect with this: > > On my openSUSE 10.3 development machine with SUSEs default MTA > Postfix installed, I occasionally send a pre-formatted mail by > feeding it directly into "/usr/sbin/sendmail -t". If I try that > while running a 2.6.25-rc5-mm1 kernel, I get: > > ts@xenon:~/kernel> /usr/sbin/sendmail -t < patch-usb-reduce-syslog-clutter-v3 > postdrop: warning: can't open /proc/net/if_inet6 (Permission denied) - skipping IPv6 configuration > postdrop: fatal: parameter inet_interfaces: no local interface found for ::1 > sendmail: warning: command "/usr/sbin/postdrop -r" exited with status 1 > sendmail: fatal: ts(1000): unable to execute /usr/sbin/postdrop -r: Success > ts@xenon:~/kernel> > > and unsurprisingly, the mail is not sent. If I do the same as root, > everything works as usual, there is no console output from the > sendmail command, and the mail goes out as it should. All other > networking applications appear to be running normally. > > On a 2.6.25-rc5 (non-mm) kernel I do not need to run the sendmail > command as root. It works just as well if I run it as myself. > > IPv6 is not in use on that machine. The Ethernet interface has > just the link local IPv6 address. Possibly relevant information: > > ts@xenon:~> /sbin/ifconfig -a > eth0 Protokoll:Ethernet Hardware Adresse 00:19:D1:03:D8:FF > inet Adresse:192.168.59.102 Bcast:192.168.59.255 Maske:255.255.255.0 > inet6 Adresse: fe80::219:d1ff:fe03:d8ff/64 Gültigkeitsbereich:Verbindung > UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:78 errors:0 dropped:0 overruns:0 frame:0 > TX packets:145 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 Sendewarteschlangenlänge:100 > RX bytes:9547 (9.3 Kb) TX bytes:17952 (17.5 Kb) > Speicher:92c00000-92c20000 > > lo Protokoll:Lokale Schleife > inet Adresse:127.0.0.1 Maske:255.0.0.0 > inet6 Adresse: ::1/128 Gültigkeitsbereich:Maschine > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:2 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 Sendewarteschlangenlänge:0 > RX bytes:100 (100.0 b) TX bytes:100 (100.0 b) > > ts@xenon:~/kernel> ls -l /proc/net/if_inet6 > -r--r--r-- 1 root root 0 13. Mär 19:26 /proc/net/if_inet6 > ts@xenon:~> cat /proc/net/if_inet6 > fe800000000000000219d1fffe03d8ff 02 40 20 80 eth0 > 00000000000000000000000000000001 01 80 10 80 lo > ts@xenon:~> uname -a > Linux xenon 2.6.25-rc5-mm1-testing #1 SMP PREEMPT Tue Mar 11 14:34:49 CET 2008 i686 i686 i386 GNU/Linux > > As you see, I can cat /proc/net/if_inet6 as regular (non-root) user > just fine, even though Postfix complains it cannot access it. > The content of /proc/net/if_inet6 is identical if I cat it on > kernel 2.6.25-rc5 mainline. > > CCing a selection of IPv6 networking related maintainer addresses. > If you need more information or want me to test something, let me > know. Hi Tilman, Is it possible to have your config file used to compile the kernel ? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-13 22:21 ` Daniel Lezcano @ 2008-03-14 0:08 ` Tilman Schmidt 2008-03-17 10:44 ` Daniel Lezcano 0 siblings, 1 reply; 24+ messages in thread From: Tilman Schmidt @ 2008-03-14 0:08 UTC (permalink / raw) To: Daniel Lezcano Cc: Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji [-- Attachment #1: Type: text/plain, Size: 1163 bytes --] Am 13.03.2008 23:21 schrieb Daniel Lezcano: > Tilman Schmidt wrote: >> ts@xenon:~/kernel> /usr/sbin/sendmail -t < patch-usb-reduce-syslog-clutter-v3 >> postdrop: warning: can't open /proc/net/if_inet6 (Permission denied) - skipping IPv6 configuration >> postdrop: fatal: parameter inet_interfaces: no local interface found for ::1 >> sendmail: warning: command "/usr/sbin/postdrop -r" exited with status 1 >> sendmail: fatal: ts(1000): unable to execute /usr/sbin/postdrop -r: Success >> ts@xenon:~/kernel> >> >> and unsurprisingly, the mail is not sent. If I do the same as root, >> everything works as usual, there is no console output from the >> sendmail command, and the mail goes out as it should. All other >> networking applications appear to be running normally. > Is it possible to have your config file used to compile the kernel ? Sure. You can find it at http://gollum.phnxsoft.com/~ts/linux/config-2.6.25-rc5-mm1 HTH T. -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 253 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-14 0:08 ` Tilman Schmidt @ 2008-03-17 10:44 ` Daniel Lezcano 2008-03-17 12:50 ` Benjamin Thery 2008-03-17 13:06 ` Tilman Schmidt 0 siblings, 2 replies; 24+ messages in thread From: Daniel Lezcano @ 2008-03-17 10:44 UTC (permalink / raw) To: Tilman Schmidt Cc: Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji Tilman Schmidt wrote: > Am 13.03.2008 23:21 schrieb Daniel Lezcano: >> Tilman Schmidt wrote: > >>> ts@xenon:~/kernel> /usr/sbin/sendmail -t < patch-usb-reduce-syslog-clutter-v3 >>> postdrop: warning: can't open /proc/net/if_inet6 (Permission denied) - skipping IPv6 configuration >>> postdrop: fatal: parameter inet_interfaces: no local interface found for ::1 >>> sendmail: warning: command "/usr/sbin/postdrop -r" exited with status 1 >>> sendmail: fatal: ts(1000): unable to execute /usr/sbin/postdrop -r: Success >>> ts@xenon:~/kernel> >>> >>> and unsurprisingly, the mail is not sent. If I do the same as root, >>> everything works as usual, there is no console output from the >>> sendmail command, and the mail goes out as it should. All other >>> networking applications appear to be running normally. > >> Is it possible to have your config file used to compile the kernel ? > > Sure. You can find it at > http://gollum.phnxsoft.com/~ts/linux/config-2.6.25-rc5-mm1 Thanks, I was not able to reproduce it, but I think I didn't configured postfix as I should had. What version do you use ? If I may ask you, can you put your postfix configuration file and a strace -f of your failing command ? on your website, that will help me a lot to investigate. -- Daniel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-17 10:44 ` Daniel Lezcano @ 2008-03-17 12:50 ` Benjamin Thery 2008-03-17 13:35 ` Tilman Schmidt 2008-03-17 13:06 ` Tilman Schmidt 1 sibling, 1 reply; 24+ messages in thread From: Benjamin Thery @ 2008-03-17 12:50 UTC (permalink / raw) To: Daniel Lezcano Cc: Tilman Schmidt, Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji I also tried to reproduce your problem with Postfix (on a Debian distro) but failed to obtain the error message. While googling for the error string, I found this link which report the same kind of error when Postfix is used with grsecurity (in 2006): http://blog.jensthebrain.de/archives/2006/12/11/IPv6-Probleme-mit-Postfix-und-grsecurity I barely understand German so I'm not sure it is related to your problem. Benjamin On Mon, Mar 17, 2008 at 11:44 AM, Daniel Lezcano <dlezcano@fr.ibm.com> wrote: > Tilman Schmidt wrote: > > Am 13.03.2008 23:21 schrieb Daniel Lezcano: > >> Tilman Schmidt wrote: > > > >>> ts@xenon:~/kernel> /usr/sbin/sendmail -t < patch-usb-reduce-syslog-clutter-v3 > >>> postdrop: warning: can't open /proc/net/if_inet6 (Permission denied) - skipping IPv6 configuration > >>> postdrop: fatal: parameter inet_interfaces: no local interface found for ::1 > >>> sendmail: warning: command "/usr/sbin/postdrop -r" exited with status 1 > >>> sendmail: fatal: ts(1000): unable to execute /usr/sbin/postdrop -r: Success > >>> ts@xenon:~/kernel> > >>> > >>> and unsurprisingly, the mail is not sent. If I do the same as root, > >>> everything works as usual, there is no console output from the > >>> sendmail command, and the mail goes out as it should. All other > >>> networking applications appear to be running normally. > > > >> Is it possible to have your config file used to compile the kernel ? > > > > Sure. You can find it at > > http://gollum.phnxsoft.com/~ts/linux/config-2.6.25-rc5-mm1 > > Thanks, > > I was not able to reproduce it, but I think I didn't configured postfix > as I should had. What version do you use ? > If I may ask you, can you put your postfix configuration file and a > strace -f of your failing command ? on your website, that will help me a > lot to investigate. > > -- Daniel > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-17 12:50 ` Benjamin Thery @ 2008-03-17 13:35 ` Tilman Schmidt 0 siblings, 0 replies; 24+ messages in thread From: Tilman Schmidt @ 2008-03-17 13:35 UTC (permalink / raw) To: Benjamin Thery Cc: Daniel Lezcano, Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji [-- Attachment #1: Type: text/plain, Size: 1426 bytes --] Benjamin Thery schrieb: > While googling for the error string, I found this link which report > the same kind of > error when Postfix is used with grsecurity (in 2006): > > http://blog.jensthebrain.de/archives/2006/12/11/IPv6-Probleme-mit-Postfix-und-grsecurity > > I barely understand German so I'm not sure it is related to your problem. The userspace failure described there is indeed the same as mine: Postfix' sendmail command tries to open "/proc/net/if_inet6" which fails with EACCES. But I have never installed grsecurity on this machine, and the problem appeared for me only with kernel 2.6.25-rc5-mm1, not when running kernel 2.6.25-rc5 on the same machine, so I guess the cause must be something different. What's also strange is that I can "cat /proc/net/if_inet6" from the command line as the same non-root user with no problem at all. strace of "cat /proc/net/if_inet6" has: open("/proc/net/if_inet6", O_RDONLY|O_LARGEFILE) = 3 strace of "/usr/sbin/sendmail", however: open("/proc/net/if_inet6", O_RDONLY) = -1 EACCES (Permission denied) Both run as ts@xenon:~> id uid=1000(ts) gid=100(users) groups=0(root),14(uucp),16(dialout),33(video),100(users),112(bacula) HTH T. -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-17 10:44 ` Daniel Lezcano 2008-03-17 12:50 ` Benjamin Thery @ 2008-03-17 13:06 ` Tilman Schmidt 2008-03-17 13:17 ` Daniel Lezcano 1 sibling, 1 reply; 24+ messages in thread From: Tilman Schmidt @ 2008-03-17 13:06 UTC (permalink / raw) To: Daniel Lezcano Cc: Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji [-- Attachment #1: Type: text/plain, Size: 767 bytes --] Daniel Lezcano schrieb: > I was not able to reproduce it, but I think I didn't configured postfix > as I should had. What version do you use ? It's the one that comes with openSUSE 10.3: ts@xenon:~> rpm -q postfix postfix-2.4.5-20.2 > If I may ask you, can you put your postfix configuration file and a > strace -f of your failing command ? on your website, that will help me a > lot to investigate. Sure, no problem. You may find them at http://gollum.phnxsoft.com/~ts/linux/main.cf http://gollum.phnxsoft.com/~ts/linux/strace.log HTH T. -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-17 13:06 ` Tilman Schmidt @ 2008-03-17 13:17 ` Daniel Lezcano 0 siblings, 0 replies; 24+ messages in thread From: Daniel Lezcano @ 2008-03-17 13:17 UTC (permalink / raw) To: Tilman Schmidt Cc: Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji Tilman Schmidt wrote: > Daniel Lezcano schrieb: > >> I was not able to reproduce it, but I think I didn't configured >> postfix as I should had. What version do you use ? > > It's the one that comes with openSUSE 10.3: > > ts@xenon:~> rpm -q postfix > postfix-2.4.5-20.2 > >> If I may ask you, can you put your postfix configuration file and a >> strace -f of your failing command ? on your website, that will help me >> a lot to investigate. > > Sure, no problem. You may find them at > > http://gollum.phnxsoft.com/~ts/linux/main.cf > http://gollum.phnxsoft.com/~ts/linux/strace.log Thank you very much, I will try to reproduce it with a simple program. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-13 19:48 ` [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root Tilman Schmidt 2008-03-13 22:21 ` Daniel Lezcano @ 2008-03-19 17:52 ` Benjamin Thery 2008-03-19 21:16 ` Andrew Morton 1 sibling, 1 reply; 24+ messages in thread From: Benjamin Thery @ 2008-03-19 17:52 UTC (permalink / raw) To: Tilman Schmidt Cc: Andrew Morton, linux-kernel, netdev, David Miller, pekkas, yoshfuji, Daniel Lezcano, Pavel Emelyanov [-- Attachment #1: Type: text/plain, Size: 5687 bytes --] Tilman, I've finally managed to reproduce your problem with Postfix on one of my victims. Earlier, in the afternoon, I wrote a piece of code that triggered a similar behaviour, but I wasn't sure it was exactly the problem you found. So, I've rebuilt Postfix, added some traces and, voila, same issue as yours. (The version of Postfix originally installed on my machine seems to have IPv6 disabled) I bisected the problem to the commit "[NET]: Make /proc/net a symlink on /proc/self/net (v3)" Here is what happens: - Recently /proc/net has been moved to /proc/self/net, and /proc/self/net is a symlink on this directory. - Before that everybody could access /proc/net and read /proc/net/if_inet6: dr-xr-xr-x 6 root root 0 2008-03-05 15:23 /proc/net - Now, /proc/self/net has a more restrictive access mode and ony the owner of the process can enter the directory: dr-xr--r-- 5 toto toto 0 Mar 19 17:30 net This is not a problem in most of the cases, but it becomes annoying when a process decides to change its UID or GID. It may loose access to its own /proc/self/net entries. - What happens in the Postfix case is the 'sendmail' process executes the '/usr/sbin/postdrop' binary to enqueue the message, but unfortunately '/usr/bin/postdrop' has the setgid bit set: -rwxr-sr-x 1 root postdrop 479475 Mar 19 17:14 /usr/sbin/postdrop The process egid changes and this seems to be problematic to access /proc/self/net/if_inet6. :) I've attached a tiny test program that can be used to reproduce the problem without Postfix. - Either execute it as root and give it an unprivileged uid in argument ./test-proc_net_if_inet6 1001 - Or change its ownership and access mode to: -rwxr-sr-x root postdrop and execute it as a lambda user. chown root:postdrop test-proc_net_if_inet6; chmod 2755 test-proc_net_if_inet6 ./test-proc_net_if_inet6 I've found the cause but not the fix. :) (Adding Pavel in cc:) Regards, Benjamin On Thu, Mar 13, 2008 at 8:48 PM, Tilman Schmidt <tilman@imap.cc> wrote: > Am 11.03.2008 09:14 schrieb Andrew Morton: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/ > > I'm noticing a strange effect with this: > > On my openSUSE 10.3 development machine with SUSEs default MTA > Postfix installed, I occasionally send a pre-formatted mail by > feeding it directly into "/usr/sbin/sendmail -t". If I try that > while running a 2.6.25-rc5-mm1 kernel, I get: > > ts@xenon:~/kernel> /usr/sbin/sendmail -t < patch-usb-reduce-syslog-clutter-v3 > postdrop: warning: can't open /proc/net/if_inet6 (Permission denied) - skipping IPv6 configuration > postdrop: fatal: parameter inet_interfaces: no local interface found for ::1 > sendmail: warning: command "/usr/sbin/postdrop -r" exited with status 1 > sendmail: fatal: ts(1000): unable to execute /usr/sbin/postdrop -r: Success > ts@xenon:~/kernel> > > and unsurprisingly, the mail is not sent. If I do the same as root, > everything works as usual, there is no console output from the > sendmail command, and the mail goes out as it should. All other > networking applications appear to be running normally. > > On a 2.6.25-rc5 (non-mm) kernel I do not need to run the sendmail > command as root. It works just as well if I run it as myself. > > IPv6 is not in use on that machine. The Ethernet interface has > just the link local IPv6 address. Possibly relevant information: > > ts@xenon:~> /sbin/ifconfig -a > eth0 Protokoll:Ethernet Hardware Adresse 00:19:D1:03:D8:FF > inet Adresse:192.168.59.102 Bcast:192.168.59.255 Maske:255.255.255.0 > inet6 Adresse: fe80::219:d1ff:fe03:d8ff/64 Gültigkeitsbereich:Verbindung > UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:78 errors:0 dropped:0 overruns:0 frame:0 > TX packets:145 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 Sendewarteschlangenlänge:100 > RX bytes:9547 (9.3 Kb) TX bytes:17952 (17.5 Kb) > Speicher:92c00000-92c20000 > > lo Protokoll:Lokale Schleife > inet Adresse:127.0.0.1 Maske:255.0.0.0 > inet6 Adresse: ::1/128 Gültigkeitsbereich:Maschine > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:2 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 Sendewarteschlangenlänge:0 > RX bytes:100 (100.0 b) TX bytes:100 (100.0 b) > > ts@xenon:~/kernel> ls -l /proc/net/if_inet6 > -r--r--r-- 1 root root 0 13. Mär 19:26 /proc/net/if_inet6 > ts@xenon:~> cat /proc/net/if_inet6 > fe800000000000000219d1fffe03d8ff 02 40 20 80 eth0 > 00000000000000000000000000000001 01 80 10 80 lo > ts@xenon:~> uname -a > Linux xenon 2.6.25-rc5-mm1-testing #1 SMP PREEMPT Tue Mar 11 14:34:49 CET 2008 i686 i686 i386 GNU/Linux > > As you see, I can cat /proc/net/if_inet6 as regular (non-root) user > just fine, even though Postfix complains it cannot access it. > The content of /proc/net/if_inet6 is identical if I cat it on > kernel 2.6.25-rc5 mainline. > > CCing a selection of IPv6 networking related maintainer addresses. > If you need more information or want me to test something, let me > know. > > HTH > T. > > -- > Tilman Schmidt E-Mail: tilman@imap.cc > Bonn, Germany > Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. > Ungeöffnet mindestens haltbar bis: (siehe Rückseite) > > [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: test-proc_net_if_inet6.c --] [-- Type: text/x-csrc; name=test-proc_net_if_inet6.c, Size: 497 bytes --] #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(int argc, char *argv[]) { FILE *fp; uid_t uid; if (argc > 1) { uid = atoi(argv[1]); if (setuid(uid) < 0) { perror("setuid"); return 1; } } printf("PID=%d UID=%d GID=%d\n", getpid(), geteuid(), getegid()); if ((fp = fopen("/proc/net/if_inet6", "r")) != 0) { printf("PASS: /proc/net/if_inet6 opened\n"); fclose(fp); } else { printf("FAIL: Can't open /proc/net/if_inet6\n"); return 1; } return 0; } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-19 17:52 ` Benjamin Thery @ 2008-03-19 21:16 ` Andrew Morton 2008-03-19 22:14 ` Benjamin Thery ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Andrew Morton @ 2008-03-19 21:16 UTC (permalink / raw) To: Benjamin Thery Cc: tilman, linux-kernel, netdev, davem, pekkas, yoshfuji, dlezcano, xemul, Rafael J. Wysocki, Eric W. Biederman On Wed, 19 Mar 2008 18:52:41 +0100 "Benjamin Thery" <ben.thery@gmail.com> wrote: > Tilman, > > I've finally managed to reproduce your problem with Postfix on one of > my victims. > > Earlier, in the afternoon, I wrote a piece of code that triggered a > similar behaviour, > but I wasn't sure it was exactly the problem you found. So, I've > rebuilt Postfix, added > some traces and, voila, same issue as yours. > (The version of Postfix originally installed on my machine seems to > have IPv6 disabled) > > I bisected the problem to the commit "[NET]: Make /proc/net a symlink > on /proc/self/net (v3)" > > Here is what happens: > > - Recently /proc/net has been moved to /proc/self/net, and > /proc/self/net is a symlink > on this directory. > - Before that everybody could access /proc/net and read /proc/net/if_inet6: > dr-xr-xr-x 6 root root 0 2008-03-05 15:23 /proc/net > > - Now, /proc/self/net has a more restrictive access mode and ony the > owner of the > process can enter the directory: > dr-xr--r-- 5 toto toto 0 Mar 19 17:30 net > > This is not a problem in most of the cases, but it becomes annoying > when a process > decides to change its UID or GID. It may loose access to its own > /proc/self/net entries. > > - What happens in the Postfix case is the 'sendmail' process executes the > '/usr/sbin/postdrop' binary to enqueue the message, but unfortunately > '/usr/bin/postdrop' has the setgid bit set: > -rwxr-sr-x 1 root postdrop 479475 Mar 19 17:14 /usr/sbin/postdrop > > The process egid changes and this seems to be problematic to access > /proc/self/net/if_inet6. :) > > I've attached a tiny test program that can be used to reproduce the problem > without Postfix. > - Either execute it as root and give it an unprivileged uid in argument > ./test-proc_net_if_inet6 1001 > > - Or change its ownership and access mode to: -rwxr-sr-x root postdrop > and execute it as a lambda user. > chown root:postdrop test-proc_net_if_inet6; chmod 2755 test-proc_net_if_inet6 > ./test-proc_net_if_inet6 > > I've found the cause but not the fix. :) > (Adding Pavel in cc:) > Thanks for that - most useful. Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the regression is also in mainline? 2.6.25-rc6? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-19 21:16 ` Andrew Morton @ 2008-03-19 22:14 ` Benjamin Thery 2008-03-19 22:49 ` David Miller 2008-03-19 23:31 ` Tilman Schmidt 2 siblings, 0 replies; 24+ messages in thread From: Benjamin Thery @ 2008-03-19 22:14 UTC (permalink / raw) To: Andrew Morton Cc: tilman, linux-kernel, netdev, davem, pekkas, yoshfuji, dlezcano, xemul, Rafael J. Wysocki, Eric W. Biederman On Wed, Mar 19, 2008 at 10:16 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 19 Mar 2008 18:52:41 +0100 > "Benjamin Thery" <ben.thery@gmail.com> wrote: > > > Tilman, > > > > I've finally managed to reproduce your problem with Postfix on one of > > my victims. > > > > Earlier, in the afternoon, I wrote a piece of code that triggered a > > similar behaviour, > > but I wasn't sure it was exactly the problem you found. So, I've > > rebuilt Postfix, added > > some traces and, voila, same issue as yours. > > (The version of Postfix originally installed on my machine seems to > > have IPv6 disabled) > > > > I bisected the problem to the commit "[NET]: Make /proc/net a symlink > > on /proc/self/net (v3)" > > > > Here is what happens: > > > > - Recently /proc/net has been moved to /proc/self/net, and > > /proc/self/net is a symlink > > on this directory. > > - Before that everybody could access /proc/net and read /proc/net/if_inet6: > > dr-xr-xr-x 6 root root 0 2008-03-05 15:23 /proc/net > > > > - Now, /proc/self/net has a more restrictive access mode and ony the > > owner of the > > process can enter the directory: > > dr-xr--r-- 5 toto toto 0 Mar 19 17:30 net > > > > This is not a problem in most of the cases, but it becomes annoying > > when a process > > decides to change its UID or GID. It may loose access to its own > > /proc/self/net entries. > > > > - What happens in the Postfix case is the 'sendmail' process executes the > > '/usr/sbin/postdrop' binary to enqueue the message, but unfortunately > > '/usr/bin/postdrop' has the setgid bit set: > > -rwxr-sr-x 1 root postdrop 479475 Mar 19 17:14 /usr/sbin/postdrop > > > > The process egid changes and this seems to be problematic to access > > /proc/self/net/if_inet6. :) > > > > I've attached a tiny test program that can be used to reproduce the problem > > without Postfix. > > - Either execute it as root and give it an unprivileged uid in argument > > ./test-proc_net_if_inet6 1001 > > > > - Or change its ownership and access mode to: -rwxr-sr-x root postdrop > > and execute it as a lambda user. > > chown root:postdrop test-proc_net_if_inet6; chmod 2755 test-proc_net_if_inet6 > > ./test-proc_net_if_inet6 > > > > I've found the cause but not the fix. :) > > (Adding Pavel in cc:) > > > > Thanks for that - most useful. > > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the > regression is also in mainline? 2.6.25-rc6? Yes, it is in mainline. I reproduced it on 2.6.25-rc5. Benjamin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-19 21:16 ` Andrew Morton 2008-03-19 22:14 ` Benjamin Thery @ 2008-03-19 22:49 ` David Miller 2008-03-20 8:26 ` Benjamin Thery 2008-03-19 23:31 ` Tilman Schmidt 2 siblings, 1 reply; 24+ messages in thread From: David Miller @ 2008-03-19 22:49 UTC (permalink / raw) To: akpm Cc: ben.thery, tilman, linux-kernel, netdev, pekkas, yoshfuji, dlezcano, xemul, rjw, ebiederm From: Andrew Morton <akpm@linux-foundation.org> Date: Wed, 19 Mar 2008 14:16:08 -0700 > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the > regression is also in mainline? 2.6.25-rc6? It is in 2.6.25-rc6, correct. If Pavel or someone else doesn't produce a good fix soon I'll revert the guilty change as this bug is worse than the problem that changeset fixes. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-19 22:49 ` David Miller @ 2008-03-20 8:26 ` Benjamin Thery 2008-03-20 10:21 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Benjamin Thery @ 2008-03-20 8:26 UTC (permalink / raw) To: David Miller Cc: akpm, tilman, linux-kernel, netdev, pekkas, yoshfuji, dlezcano, xemul, rjw, ebiederm On Wed, Mar 19, 2008 at 11:49 PM, David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Wed, 19 Mar 2008 14:16:08 -0700 > > > > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the > > regression is also in mainline? 2.6.25-rc6? > > It is in 2.6.25-rc6, correct. > > If Pavel or someone else doesn't produce a good fix soon > I'll revert the guilty change as this bug is worse than > the problem that changeset fixes. Andre Noll sent a patch to LKML, acked by Pavel: "Fix permissions of /proc/net" http://thread.gmane.org/gmane.linux.kernel/655148 Benjamin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-20 8:26 ` Benjamin Thery @ 2008-03-20 10:21 ` Rafael J. Wysocki 2008-03-20 12:52 ` Pavel Emelyanov 0 siblings, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2008-03-20 10:21 UTC (permalink / raw) To: Benjamin Thery Cc: David Miller, akpm, tilman, linux-kernel, netdev, pekkas, yoshfuji, dlezcano, xemul, ebiederm On Thursday, 20 of March 2008, Benjamin Thery wrote: > On Wed, Mar 19, 2008 at 11:49 PM, David Miller <davem@davemloft.net> wrote: > > From: Andrew Morton <akpm@linux-foundation.org> > > Date: Wed, 19 Mar 2008 14:16:08 -0700 > > > > > > > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the > > > regression is also in mainline? 2.6.25-rc6? > > > > It is in 2.6.25-rc6, correct. > > > > If Pavel or someone else doesn't produce a good fix soon > > I'll revert the guilty change as this bug is worse than > > the problem that changeset fixes. > > Andre Noll sent a patch to LKML, acked by Pavel: > > "Fix permissions of /proc/net" > http://thread.gmane.org/gmane.linux.kernel/655148 Have you tested that patch? Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-20 10:21 ` Rafael J. Wysocki @ 2008-03-20 12:52 ` Pavel Emelyanov 2008-03-20 13:48 ` Benjamin Thery 0 siblings, 1 reply; 24+ messages in thread From: Pavel Emelyanov @ 2008-03-20 12:52 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Benjamin Thery, David Miller, akpm, tilman, linux-kernel, netdev, pekkas, yoshfuji, dlezcano, xemul, ebiederm Rafael J. Wysocki wrote: > On Thursday, 20 of March 2008, Benjamin Thery wrote: >> On Wed, Mar 19, 2008 at 11:49 PM, David Miller <davem@davemloft.net> wrote: >>> From: Andrew Morton <akpm@linux-foundation.org> >>> Date: Wed, 19 Mar 2008 14:16:08 -0700 >>> >>> >>> > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the >>> > regression is also in mainline? 2.6.25-rc6? >>> >>> It is in 2.6.25-rc6, correct. >>> >>> If Pavel or someone else doesn't produce a good fix soon >>> I'll revert the guilty change as this bug is worse than >>> the problem that changeset fixes. >> Andre Noll sent a patch to LKML, acked by Pavel: >> >> "Fix permissions of /proc/net" >> http://thread.gmane.org/gmane.linux.kernel/655148 > > Have you tested that patch? I did - works OK, that's why I Acked it. > Rafael > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-20 12:52 ` Pavel Emelyanov @ 2008-03-20 13:48 ` Benjamin Thery 2008-03-20 14:38 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Benjamin Thery @ 2008-03-20 13:48 UTC (permalink / raw) To: Pavel Emelyanov Cc: Rafael J. Wysocki, David Miller, akpm, tilman, linux-kernel, netdev, pekkas, yoshfuji, dlezcano, ebiederm On Thu, Mar 20, 2008 at 1:52 PM, Pavel Emelyanov <xemul@openvz.org> wrote: > > Rafael J. Wysocki wrote: > > On Thursday, 20 of March 2008, Benjamin Thery wrote: > >> On Wed, Mar 19, 2008 at 11:49 PM, David Miller <davem@davemloft.net> wrote: > >>> From: Andrew Morton <akpm@linux-foundation.org> > >>> Date: Wed, 19 Mar 2008 14:16:08 -0700 > >>> > >>> > >>> > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the > >>> > regression is also in mainline? 2.6.25-rc6? > >>> > >>> It is in 2.6.25-rc6, correct. > >>> > >>> If Pavel or someone else doesn't produce a good fix soon > >>> I'll revert the guilty change as this bug is worse than > >>> the problem that changeset fixes. > >> Andre Noll sent a patch to LKML, acked by Pavel: > >> > >> "Fix permissions of /proc/net" > >> http://thread.gmane.org/gmane.linux.kernel/655148 > > > > Have you tested that patch? > > I did - works OK, that's why I Acked it. Also tested here. It fixes the regression. Benjamin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-20 13:48 ` Benjamin Thery @ 2008-03-20 14:38 ` Rafael J. Wysocki 0 siblings, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2008-03-20 14:38 UTC (permalink / raw) To: Benjamin Thery Cc: Pavel Emelyanov, David Miller, akpm, tilman, linux-kernel, netdev, pekkas, yoshfuji, dlezcano, ebiederm On Thursday, 20 of March 2008, Benjamin Thery wrote: > On Thu, Mar 20, 2008 at 1:52 PM, Pavel Emelyanov <xemul@openvz.org> wrote: > > > > Rafael J. Wysocki wrote: > > > On Thursday, 20 of March 2008, Benjamin Thery wrote: > > >> On Wed, Mar 19, 2008 at 11:49 PM, David Miller <davem@davemloft.net> wrote: > > >>> From: Andrew Morton <akpm@linux-foundation.org> > > >>> Date: Wed, 19 Mar 2008 14:16:08 -0700 > > >>> > > >>> > > >>> > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the > > >>> > regression is also in mainline? 2.6.25-rc6? > > >>> > > >>> It is in 2.6.25-rc6, correct. > > >>> > > >>> If Pavel or someone else doesn't produce a good fix soon > > >>> I'll revert the guilty change as this bug is worse than > > >>> the problem that changeset fixes. > > >> Andre Noll sent a patch to LKML, acked by Pavel: > > >> > > >> "Fix permissions of /proc/net" > > >> http://thread.gmane.org/gmane.linux.kernel/655148 > > > > > > Have you tested that patch? > > > > I did - works OK, that's why I Acked it. > > Also tested here. It fixes the regression. OK, thanks. Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root 2008-03-19 21:16 ` Andrew Morton 2008-03-19 22:14 ` Benjamin Thery 2008-03-19 22:49 ` David Miller @ 2008-03-19 23:31 ` Tilman Schmidt 2 siblings, 0 replies; 24+ messages in thread From: Tilman Schmidt @ 2008-03-19 23:31 UTC (permalink / raw) To: Andrew Morton Cc: Benjamin Thery, linux-kernel, netdev, davem, pekkas, yoshfuji, dlezcano, xemul, Rafael J. Wysocki, Eric W. Biederman [-- Attachment #1: Type: text/plain, Size: 790 bytes --] Am 19.03.2008 22:16 schrieb Andrew Morton: > On Wed, 19 Mar 2008 18:52:41 +0100 > "Benjamin Thery" <ben.thery@gmail.com> wrote: > >> I bisected the problem to the commit "[NET]: Make /proc/net a symlink >> on /proc/self/net (v3)" [...] >> I've attached a tiny test program that can be used to reproduce the problem >> without Postfix. Thanks. Works great. > Although this is advertised as a 2.6.25-rc5-mm1 problem, I assume the > regression is also in mainline? 2.6.25-rc6? My results: up to 2.6.25-rc5 -- good 2.6.25-rc5-mm1 -- bad 2.6.25-rc6 -- bad HTH T. -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 253 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2008-03-21 12:12 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20080311011434.ad8c8d7d.akpm@linux-foundation.org>
[not found] ` <20080311202300.GA8957@vino.hallyn.com>
[not found] ` <20080311133920.68dc410b.akpm@linux-foundation.org>
[not found] ` <64bb37e0803121233v30d12a58i77a1e23fd02ea6f2@mail.gmail.com>
2008-03-12 19:44 ` 2.6.25-rc5-mm1 Andrew Morton
2008-03-12 20:01 ` 2.6.25-rc5-mm1 Torsten Kaiser
2008-03-13 22:05 ` 2.6.25-rc5-mm1 Torsten Kaiser
2008-03-13 22:35 ` 2.6.25-rc5-mm1 Andrew Morton
2008-03-13 23:10 ` 2.6.25-rc5-mm1 Badari Pulavarty
2008-03-21 12:12 ` 2.6.25-rc5-mm1 Ingo Molnar
2008-03-13 19:48 ` [2.6.25-rc5-mm1] regression: cannot run Postfix sendmail command as non-root Tilman Schmidt
2008-03-13 22:21 ` Daniel Lezcano
2008-03-14 0:08 ` Tilman Schmidt
2008-03-17 10:44 ` Daniel Lezcano
2008-03-17 12:50 ` Benjamin Thery
2008-03-17 13:35 ` Tilman Schmidt
2008-03-17 13:06 ` Tilman Schmidt
2008-03-17 13:17 ` Daniel Lezcano
2008-03-19 17:52 ` Benjamin Thery
2008-03-19 21:16 ` Andrew Morton
2008-03-19 22:14 ` Benjamin Thery
2008-03-19 22:49 ` David Miller
2008-03-20 8:26 ` Benjamin Thery
2008-03-20 10:21 ` Rafael J. Wysocki
2008-03-20 12:52 ` Pavel Emelyanov
2008-03-20 13:48 ` Benjamin Thery
2008-03-20 14:38 ` Rafael J. Wysocki
2008-03-19 23:31 ` Tilman Schmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).