public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]]
@ 2011-04-15  2:06 Michael Guntsche
  2011-04-15  3:25 ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Guntsche @ 2011-04-15  2:06 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: Linus Torvalds, Jens Axboe

Good morning again,

> Hi list,
> 
> Apparently all the 2.6.39-rc kernels up to current git HEAD hang on my setup
> here during boot. They survive the initramfs step, but after or during the root
> file system mount it just hangs.
> This is a software RAID-5->LVM setup. The machine itself boots without
> any problems with a 2.6.38 kernel.
> 
> The hardware:
> processor	: 0
> vendor_id	: GenuineIntel
> cpu family	: 15
> model		: 1
> model name	: Intel(R) Pentium(R) 4 CPU 1.70GHz
> stepping	: 2
> cpu MHz		: 1699.523
> cache size	: 256 KB
> fdiv_bug	: no
> hlt_bug		: no
> f00f_bug	: no
> coma_bug	: no
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 2
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pebs bts
> bogomips	: 3399.04
> clflush size	: 64
> cache_alignment	: 128
> address sizes	: 36 bits physical, 32 bits virtual
> power management:
> 
<boot messages removed>

> md0: detected capacity change from 0 to 57475072
>  md0:
> md: md1 stopped.
> md: bind<sdd2>
> md: bind<sde2>
> md: bind<sdf2>
> md: bind<sda2>
> md: bind<sdc2>
> md/raid:md1: device sdc2 operational as raid disk 0
> md/raid:md1: device sda2 operational as raid disk 4
> md/raid:md1: device sdf2 operational as raid disk 3
> md/raid:md1: device sde2 operational as raid disk 2
> md/raid:md1: device sdd2 operational as raid disk 1
> md/raid:md1: allocated 5258kB
> md/raid:md1: raid level 5 active with 5 out of 5 devices, algorithm 2
> created bitmap (4 pages) for device md1
> md1: bitmap initialized from disk: read 1/1 pages, set 0 bits
> md1: detected capacity change from 0 to 2000189128704
>  md1: unknown partition table
> device fsid d64bac2f4cf259ba-326da6041632dd9c devid 1 transid 557 /dev/sdb1
> end_request: I/O error, dev fd0, sector 0
> EXT3-fs: barriers not enabled
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs (dm-0): mounted filesystem with writeback data mode
> udev[1333]: starting version 167
> input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input2
> ACPI: Power Button [PWRB]
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
> ACPI: Power Button [PWRF]
> parport_pc 00:08: reported by Plug and Play ACPI
> parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE,EPP]
> intel_rng: FWH not detected
> Linux agpgart interface v0.103
> i801_smbus 0000:00:1f.3: PCI INT B -> GSI 17 (level, low) -> IRQ 17
> agpgart-intel 0000:00:00.0: Intel 865 Chipset
> agpgart-intel 0000:00:00.0: AGP aperture is 64M @ 0xf8000000
> Intel ICH 0000:00:1f.5: PCI INT B -> GSI 17 (level, low) -> IRQ 17
> intel8x0: white list rate for 1043:80f3 is 48000
> 
> Here the mount hangs. Apparently after root was mounted successfully.
> 
> SysRq : Show Blocked State
>   task                PC stack   pid father
> mount           D f4cfde00     0  1765   1745 0x00000000
>  f4c23dd0 00000086 f44aa800 f4cfde00 f44aa500 0000004d f4c23d74 c11a35f9
>  f4c23d80 f4c23dd0 f4431760 8416416f 00000004 f4431760 00000041 ffffffff
>  00000001 00000008 00100800 f4431760 00000008 00000000 e8da47f8 00000000
> Call Trace:
>  [<c11a35f9>] ? dm_request+0x8a/0xf6
>  [<c105c63b>] ? mempool_alloc_slab+0xe/0x10
>  [<c12197b9>] io_schedule+0x2b/0x42
>  [<c109cbad>] sleep_on_buffer+0x8/0xc
>  [<c1219ad3>] __wait_on_bit+0x43/0x60
>  [<c109cba5>] ? __wait_on_buffer+0x27/0x27
>  [<c1219b46>] out_of_line_wait_on_bit+0x56/0x5e
>  [<c109cba5>] ? __wait_on_buffer+0x27/0x27
>  [<c1038e76>] ? autoremove_wake_function+0x36/0x36
>  [<c109cba3>] __wait_on_buffer+0x25/0x27
>  [<c109d367>] __sync_dirty_buffer+0x6b/0x8b
>  [<c109d394>] sync_dirty_buffer+0xd/0xf
>  [<fa8d7d80>] ext3_commit_super.clone.24+0x4b/0xb0 [ext3]
>  [<c10ed9c7>] ? match_token+0xfd/0x1a2
>  [<fa8d8585>] ext3_setup_super+0xb4/0x20d [ext3]
>  [<fa8d776d>] ? parse_options+0x32/0x4b0 [ext3]
>  [<c121a2e0>] ? __mutex_lock_slowpath+0x124/0x1b2
>  [<c121a379>] ? mutex_lock+0xb/0x22
>  [<fa8da100>] ext3_remount+0x1df/0x27e [ext3]
>  [<fa8d9f21>] ? ext3_fill_super+0x1843/0x1843 [ext3]
>  [<c10811d9>] do_remount_sb+0x7d/0xf6
>  [<c1094379>] do_mount+0x44f/0x633
>  [<c1093e9e>] ? copy_mount_options+0x8d/0xee
>  [<c10947bf>] sys_mount+0x66/0x9b
>  [<c121af1d>] syscall_call+0x7/0xb
> Sched Debug Version: v0.10, 2.6.39-rc3+ #2
> ktime                                   : 37933.056186
> sched_clk                               : 38008.731563
> cpu_clk                                 : 37806.036844
> jiffies                                 : 4294901746
> sched_clock_stable                      : 0
> 
> sysctl_sched
>   .sysctl_sched_latency                    : 6.000000
>   .sysctl_sched_min_granularity            : 0.750000
>   .sysctl_sched_wakeup_granularity         : 1.000000
>   .sysctl_sched_child_runs_first           : 0
>   .sysctl_sched_features                   : 7279
>   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
> 
> cpu#0, 1699.419 MHz
>   .nr_running                    : 0
>   .load                          : 0
>   .nr_switches                   : 19596
>   .nr_load_updates               : 3167
>   .nr_uninterruptible            : 1
>   .next_balance                  : 0.000000
>   .curr->pid                     : 0
>   .clock                         : 37680.038704
>   .cpu_load[0]                   : 0
>   .cpu_load[1]                   : 0
>   .cpu_load[2]                   : 0
>   .cpu_load[3]                   : 4
>   .cpu_load[4]                   : 17
>   .yld_count                     : 3
>   .sched_switch                  : 0
>   .sched_count                   : 19620
>   .sched_goidle                  : 1555
>   .ttwu_count                    : 1
>   .ttwu_local                    : 1
>   .bkl_count                     : 0
> 
> cfs_rq[0]:
>   .exec_clock                    : 6548.484099
>   .MIN_vruntime                  : 0.000001
>   .min_vruntime                  : 6269.914000
>   .max_vruntime                  : 0.000001
>   .spread                        : 0.000000
>   .spread0                       : 0.000000
>   .nr_spread_over                : 0
>   .nr_running                    : 0
>   .load                          : 0
> 
> rt_rq[0]:
>   .rt_nr_running                 : 0
>   .rt_throttled                  : 0
>   .rt_time                       : 200.624789
>   .rt_runtime                    : 950.000000
> 
> runnable tasks:
>             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
> ----------------------------------------------------------------------------------------------------------

After talking to Dave Chinner I looked at the block layer merges. I ended
up on

6c510389005 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block

Starting with this merge I see the problems. For testing I compiled the
previous merge

3dab04e6978 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn1030

And apart from a futex error that gets fixed later on I was able to
mount all partitions and could boot. 
As I said initially this happens with current HEAD as well so has not
been fixed yet. Since not everyone is seeing this I wonder if it is
related to my setup here SATA->Raid5->LVM?

Kind regards,
Michael Guntsche

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]]
  2011-04-15  2:06 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]] Michael Guntsche
@ 2011-04-15  3:25 ` Linus Torvalds
  2011-04-15  4:22   ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2011-04-15  3:25 UTC (permalink / raw)
  To: Michael Guntsche; +Cc: linux-kernel@vger.kernel.org, Jens Axboe

On Thu, Apr 14, 2011 at 7:06 PM, Michael Guntsche <mike@it-loops.com> wrote:
>
> After talking to Dave Chinner I looked at the block layer merges. I ended
> up on
>
> 6c510389005 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
>
> Starting with this merge I see the problems.

Ok, so that's not very surprising. It's the new per-thread plugging,
and yes, there's clearly something broken with regards to MD/DM.

And I have a suspicion.

Jens - tell me if I'm wrong, but look at the crazy plug flushing code:

  void __blk_flush_plug(struct task_struct *tsk, struct blk_plug *plug)
  {
        __blk_finish_plug(tsk, plug);
        tsk->plug = plug;
  }

and explain that idiotic __blk_finish_plug() logic to me:

  static void __blk_finish_plug(struct task_struct *tsk, struct blk_plug *plug)
  {
          flush_plug_list(plug);

          if (plug == tsk->plug)
                  tsk->plug = NULL;
  }

and in particular the "set it to NULL, only to then set it back
again". That code makes no sense. __blk_finish_plug() is only ever
called with "plug" being "tsk->plug", and afaik nothing will ever
modify a non-NULL plug (if it is a nested plug, it would never be
added to the task) _except_ for that __blk_finish_plug(). No? So it
sets it to NULL, and then immediately the caller will set it back
again.

What's the thinking there? It looks very confused to me.

Now, clearly RAID seems to be involved in the problem? The main thing
with that would be that the execution of the requests would tend to
generate new requests, that go back on the plug queue. Yes? And the
loop in flush_plug_list() means that they all should get flushed out,
I assume. But something clearly isn't working, and it does seem to be
about the RAID kind of setup. So either they didn't get put on the
plug queue, or the task got a new plug (which _wasn't_ flushed).

Because we're clearly waiting for some request that hasn't completed.
Where in the plug queues would it be hiding?

The whole block layer plugging looks to be the major problem of the 39
cycle. Jens, pls explain.

                      Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]]
  2011-04-15  3:25 ` Linus Torvalds
@ 2011-04-15  4:22   ` Christoph Hellwig
  2011-04-15  6:13     ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2011-04-15  4:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Michael Guntsche, linux-kernel@vger.kernel.org, Jens Axboe

On Thu, Apr 14, 2011 at 08:25:33PM -0700, Linus Torvalds wrote:
> What's the thinking there? It looks very confused to me.

It is.  I sent a patch a couple of days ago to fix it.

> Now, clearly RAID seems to be involved in the problem? The main thing
> with that would be that the execution of the requests would tend to
> generate new requests, that go back on the plug queue. Yes? And the
> loop in flush_plug_list() means that they all should get flushed out,
> I assume. But something clearly isn't working, and it does seem to be
> about the RAID kind of setup. So either they didn't get put on the
> plug queue, or the task got a new plug (which _wasn't_ flushed).
> 
> Because we're clearly waiting for some request that hasn't completed.
> Where in the plug queues would it be hiding?

There's a thread where Neil explains what the problem with MD is - it
needs a callback on unplug time to generate e.g. the write intent bitmap
or as large as possible writes for RAID5.  Jens and Neil have been
looking into it.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with  2.6.39-rc[123]]
  2011-04-15  4:22   ` Christoph Hellwig
@ 2011-04-15  6:13     ` Jens Axboe
  2011-04-21 17:53       ` Christian Kujau
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2011-04-15  6:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Michael Guntsche, linux-kernel@vger.kernel.org

On 2011-04-15 06:22, Christoph Hellwig wrote:
> On Thu, Apr 14, 2011 at 08:25:33PM -0700, Linus Torvalds wrote:
>> What's the thinking there? It looks very confused to me.
> 
> It is.  I sent a patch a couple of days ago to fix it.

Yeah thanks for that, I agree it looks a bit confusing as-is. I'll queue
it up.

>> Now, clearly RAID seems to be involved in the problem? The main thing
>> with that would be that the execution of the requests would tend to
>> generate new requests, that go back on the plug queue. Yes? And the
>> loop in flush_plug_list() means that they all should get flushed out,
>> I assume. But something clearly isn't working, and it does seem to be
>> about the RAID kind of setup. So either they didn't get put on the
>> plug queue, or the task got a new plug (which _wasn't_ flushed).
>>
>> Because we're clearly waiting for some request that hasn't completed.
>> Where in the plug queues would it be hiding?
> 
> There's a thread where Neil explains what the problem with MD is - it
> needs a callback on unplug time to generate e.g. the write intent bitmap
> or as large as possible writes for RAID5.  Jens and Neil have been
> looking into it.

I think we are done, Neil just needs to rebase around the current
for-linus and then we should expedite things in.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]]
  2011-04-15  6:13     ` Jens Axboe
@ 2011-04-21 17:53       ` Christian Kujau
  2011-04-21 17:59         ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Kujau @ 2011-04-21 17:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Linus Torvalds, Michael Guntsche,
	linux-kernel@vger.kernel.org

On Fri, 15 Apr 2011 at 08:13, Jens Axboe wrote:
> I think we are done, Neil just needs to rebase around the current
> for-linus and then we should expedite things in.

Whatever this was: has it been pushed to Linus yet? I'm on 2.6.39-rc4+ 
(from yesterday) and I too experience a hang during boot (and other 
strange things with udev[0])

Thanks,
Christian.

[0] http://nerdbynature.de/bits/2.6.39-rc4/
-- 
BOFH excuse #208:

Your mail is being routed through Germany ... and they're censoring us.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]]
  2011-04-21 17:53       ` Christian Kujau
@ 2011-04-21 17:59         ` Linus Torvalds
  2011-04-21 18:30           ` Christian Kujau
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2011-04-21 17:59 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Jens Axboe, Christoph Hellwig, Michael Guntsche,
	linux-kernel@vger.kernel.org

On Thu, Apr 21, 2011 at 10:53 AM, Christian Kujau <lists@nerdbynature.de> wrote:
> On Fri, 15 Apr 2011 at 08:13, Jens Axboe wrote:
>> I think we are done, Neil just needs to rebase around the current
>> for-linus and then we should expedite things in.
>
> Whatever this was: has it been pushed to Linus yet? I'm on 2.6.39-rc4+
> (from yesterday) and I too experience a hang during boot (and other
> strange things with udev[0])

As of right now, all _known_ block bugs should be fixed in the stuff I
just pushed out, with the current top-of-tree being the merge of the
IDE CD endless loop issue and the NULL ptr oops at elevator change
time:

     Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

    * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
      ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
      block: don't propagate unlisted DISK_EVENTs to userland
      elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too

but maybe yours is something else.

That said, that udevd message of yours would imply a full root
filesystem, which can cause any amount of issues at boot time..

                          Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]]
  2011-04-21 17:59         ` Linus Torvalds
@ 2011-04-21 18:30           ` Christian Kujau
  2011-04-21 18:36             ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Kujau @ 2011-04-21 18:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Christoph Hellwig, Michael Guntsche,
	linux-kernel@vger.kernel.org

Linus Torvalds wrote on 2011-04-21 10:59 :
> As of right now, all _known_ block bugs should be fixed in the stuff I
> just pushed out, with the current top-of-tree being the merge of the
> IDE CD endless loop issue and the NULL ptr oops at elevator change
> time:

OK, thanks.

> That said, that udevd message of yours would imply a full root
> filesystem, which can cause any amount of issues at boot time..

My /dev is mounted as udev and is 10MB in size, which is more than
enough for kernels < 2.6.39. But with 2.6.39 something changed and
/dev/.udev/queue.bin is growing to 7 MB and more, udevd is spinning like
crazy and the OOM is reaping processes.

I'm currently in the middle of a bisect, with 11 steps to go on this
PowerBook G4 :-\

Thanks,
Christian.
-- 
make bzImage, not war

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with  2.6.39-rc[123]]
  2011-04-21 18:30           ` Christian Kujau
@ 2011-04-21 18:36             ` Jens Axboe
  2011-04-21 20:53               ` Christian Kujau
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2011-04-21 18:36 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Linus Torvalds, Christoph Hellwig, Michael Guntsche,
	linux-kernel@vger.kernel.org

On 2011-04-21 20:30, Christian Kujau wrote:
> Linus Torvalds wrote on 2011-04-21 10:59 :
>> As of right now, all _known_ block bugs should be fixed in the stuff I
>> just pushed out, with the current top-of-tree being the merge of the
>> IDE CD endless loop issue and the NULL ptr oops at elevator change
>> time:
> 
> OK, thanks.
> 
>> That said, that udevd message of yours would imply a full root
>> filesystem, which can cause any amount of issues at boot time..
> 
> My /dev is mounted as udev and is 10MB in size, which is more than
> enough for kernels < 2.6.39. But with 2.6.39 something changed and
> /dev/.udev/queue.bin is growing to 7 MB and more, udevd is spinning like
> crazy and the OOM is reaping processes.
> 
> I'm currently in the middle of a bisect, with 11 steps to go on this
> PowerBook G4 :-\

Please try Linus -git as of now, it could be the excessive udev events
for media change that is causing your problem.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]]
  2011-04-21 18:36             ` Jens Axboe
@ 2011-04-21 20:53               ` Christian Kujau
  0 siblings, 0 replies; 9+ messages in thread
From: Christian Kujau @ 2011-04-21 20:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Christoph Hellwig, Michael Guntsche,
	linux-kernel@vger.kernel.org, 622352, ralf.hildebrandt

On Thu, 21 Apr 2011 at 20:36, Jens Axboe wrote:
> Please try Linus -git as of now, it could be the excessive udev events
> for media change that is causing your problem.

I'm running latest -git (91e8549..), which includes your for-linus branch 
and it's running fine so far (1h:30min uptime).

@Ralf: can you try this as well, maybe it'll fix #622352 for you too?

Thanks,
Christian.
-- 
BOFH excuse #382:

Someone was smoking in the computer room and set off the halon systems.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-04-21 20:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-15  2:06 2.6.39 Block layer regression was [Bug] Boot hangs with 2.6.39-rc[123]] Michael Guntsche
2011-04-15  3:25 ` Linus Torvalds
2011-04-15  4:22   ` Christoph Hellwig
2011-04-15  6:13     ` Jens Axboe
2011-04-21 17:53       ` Christian Kujau
2011-04-21 17:59         ` Linus Torvalds
2011-04-21 18:30           ` Christian Kujau
2011-04-21 18:36             ` Jens Axboe
2011-04-21 20:53               ` Christian Kujau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox