Re: [BUG] AS io-scheduler.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [BUG] AS io-scheduler.
@ 2007-07-15 15:20 Ian Kumlien
  2007-07-16 16:31 ` Chuck Ebbert
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Kumlien @ 2007-07-15 15:20 UTC (permalink / raw)
  To: Linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4615 bytes --]

Sorry, due to mental fatigue i actually forgot to repor that this was
with 2.6.22.1

Since this is mailed from outside the ml, i'm attaching the original
message:

---
Hi, 

I had emerge --sync failing several times... 

So i checked dmesg and found some info, attached further down.
This is a old VIA C3 machine with one disk, it's been running most
kernels in the 2.6.x series with no problems until now.

PS. Don't forget to CC me
DS.

BUG: unable to handle kernel paging request at virtual address ea86ac54
 printing eip:
c022dfec
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
CPU:    0
EIP:    0060:[<c022dfec>]    Not tainted VLI
EFLAGS: 00010082   (2.6.22.1 #26)
EIP is at as_can_break_anticipation+0xc/0x190
eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
00000000 
       dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
dfcffb9c 
       00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
c04d1ec0 
Call Trace:
 [<c022efc8>] as_add_request+0xa8/0xc0
 [<c0227a76>] elv_insert+0xa6/0x150
 [<c016e96e>] bio_phys_segments+0xe/0x20
 [<c022af64>] __make_request+0x384/0x490
 [<c02add1e>] ide_do_request+0x6ee/0x890
 [<c02294ab>] generic_make_request+0x18b/0x1c0
 [<c022b596>] submit_bio+0xa6/0xb0
 [<c013b7b8>] mempool_alloc+0x28/0xa0
 [<c016bb66>] __find_get_block+0xf6/0x130
 [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
 [<c016b647>] submit_bh+0xb7/0xe0
 [<c016c1f8>] ll_rw_block+0x78/0x90
 [<c019c85d>] search_by_key+0xdd/0xd20
 [<c016c201>] ll_rw_block+0x81/0x90
 [<c011f190>] irq_exit+0x40/0x60
 [<c01066e4>] do_IRQ+0x94/0xb0
 [<c0104bc3>] common_interrupt+0x23/0x30
 [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
 [<c018e580>] reiserfs_find_actor+0x0/0x20
 [<c018c33b>] reiserfs_iget+0x4b/0x80
 [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
 [<c0189824>] reiserfs_lookup+0xa4/0xf0
 [<c0157b03>] do_lookup+0xa3/0x140
 [<c0159265>] __link_path_walk+0x615/0xa20
 [<c0168a18>] __mark_inode_dirty+0x28/0x150
 [<c01631c1>] mntput_no_expire+0x11/0x50
 [<c01596b2>] link_path_walk+0x42/0xb0
 [<c0159960>] do_path_lookup+0x130/0x150
 [<c015a190>] __user_walk_fd+0x30/0x50
 [<c0154766>] vfs_lstat_fd+0x16/0x40
 [<c01547df>] sys_lstat64+0xf/0x30
 [<c0103c42>] syscall_call+0x7/0xb
 =======================
Code: c0 8b 44 cb 0c 8b 40 08 29 d0 f7 d0 c1 e8 1f 83 f0 01 eb 02 31 c0
5b c3 8d b4 26 00 00 00 00 55 57 56 53 83 ec 04 89 c3 89 14 24 <8b> 90
b4 00 00 00 85 d2 75 04 0f 0b eb fe 83 3c 24 00 74 0c 8b 
EIP: [<c022dfec>] as_can_break_anticipation+0xc/0x190 SS:ESP
0068:ceff6a70
WARNING: at block/as-iosched.c:862 as_remove_queued_request()
 [<c022e560>] as_remove_queued_request+0x40/0xc0
 [<c022e684>] as_move_to_dispatch+0xa4/0x110
 [<c0236626>] __delay+0x6/0x10
 [<c022ee03>] as_dispatch_request+0x2d3/0x310
 [<c02b39b2>] ide_dma_start+0x22/0x30
 [<c02b6be0>] ide_do_rw_disk+0x310/0x3f0
 [<c02279b5>] elv_next_request+0x105/0x120
 [<c02ad876>] ide_do_request+0x246/0x890
 [<c03d5c9d>] schedule+0x45d/0x4c0
 [<c022e7f0>] as_work_handler+0x0/0x20
 [<c022a331>] blk_start_queueing+0x11/0x20
 [<c022e7ff>] as_work_handler+0xf/0x20
 [<c0126bab>] run_workqueue+0x6b/0xe0
 [<c0127200>] worker_thread+0x0/0xc0
 [<c01272b2>] worker_thread+0xb2/0xc0
 [<c01297e0>] autoremove_wake_function+0x0/0x40
 [<c0129658>] kthread+0x38/0x60
 [<c0129620>] kthread+0x0/0x60
 [<c0104d87>] kernel_thread_helper+0x7/0x10
 =======================
WARNING: at block/as-iosched.c:966 as_move_to_dispatch()
 [<c022e6b3>] as_move_to_dispatch+0xd3/0x110
 [<c022ee03>] as_dispatch_request+0x2d3/0x310
 [<c02b39b2>] ide_dma_start+0x22/0x30
 [<c02b6be0>] ide_do_rw_disk+0x310/0x3f0
 [<c02279b5>] elv_next_request+0x105/0x120
 [<c02ad876>] ide_do_request+0x246/0x890
 [<c03d5c9d>] schedule+0x45d/0x4c0
 [<c022e7f0>] as_work_handler+0x0/0x20
 [<c022a331>] blk_start_queueing+0x11/0x20
 [<c022e7ff>] as_work_handler+0xf/0x20
 [<c0126bab>] run_workqueue+0x6b/0xe0
 [<c0127200>] worker_thread+0x0/0xc0
 [<c01272b2>] worker_thread+0xb2/0xc0
 [<c01297e0>] autoremove_wake_function+0x0/0x40
 [<c0129658>] kthread+0x38/0x60
 [<c0129620>] kthread+0x0/0x60
 [<c0104d87>] kernel_thread_helper+0x7/0x10
 =======================
 
-- 
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-15 15:20 [BUG] AS io-scheduler Ian Kumlien
@ 2007-07-16 16:31 ` Chuck Ebbert
  2007-07-16 17:29   ` Jens Axboe
  2007-07-17  1:44   ` Rene Herman
  0 siblings, 2 replies; 11+ messages in thread
From: Chuck Ebbert @ 2007-07-16 16:31 UTC (permalink / raw)
  To: pomac; +Cc: Linux-kernel, Jens Axboe, Nick Piggin

On 07/15/2007 11:20 AM, Ian Kumlien wrote:
> I had emerge --sync failing several times... 
> 
> So i checked dmesg and found some info, attached further down.
> This is a old VIA C3 machine with one disk, it's been running most
> kernels in the 2.6.x series with no problems until now.
> 
> PS. Don't forget to CC me
> DS.
> 
> BUG: unable to handle kernel paging request at virtual address ea86ac54
>  printing eip:
> c022dfec
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
> CPU:    0
> EIP:    0060:[<c022dfec>]    Not tainted VLI
> EFLAGS: 00010082   (2.6.22.1 #26)
> EIP is at as_can_break_anticipation+0xc/0x190
> eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
> esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
> Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
> 00000000 
>        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
> dfcffb9c 
>        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
> c04d1ec0 
> Call Trace:
>  [<c022efc8>] as_add_request+0xa8/0xc0
>  [<c0227a76>] elv_insert+0xa6/0x150
>  [<c016e96e>] bio_phys_segments+0xe/0x20
>  [<c022af64>] __make_request+0x384/0x490
>  [<c02add1e>] ide_do_request+0x6ee/0x890
>  [<c02294ab>] generic_make_request+0x18b/0x1c0
>  [<c022b596>] submit_bio+0xa6/0xb0
>  [<c013b7b8>] mempool_alloc+0x28/0xa0
>  [<c016bb66>] __find_get_block+0xf6/0x130
>  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
>  [<c016b647>] submit_bh+0xb7/0xe0
>  [<c016c1f8>] ll_rw_block+0x78/0x90
>  [<c019c85d>] search_by_key+0xdd/0xd20
>  [<c016c201>] ll_rw_block+0x81/0x90
>  [<c011f190>] irq_exit+0x40/0x60
>  [<c01066e4>] do_IRQ+0x94/0xb0
>  [<c0104bc3>] common_interrupt+0x23/0x30
>  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
>  [<c018e580>] reiserfs_find_actor+0x0/0x20
>  [<c018c33b>] reiserfs_iget+0x4b/0x80
>  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
>  [<c0189824>] reiserfs_lookup+0xa4/0xf0
>  [<c0157b03>] do_lookup+0xa3/0x140
>  [<c0159265>] __link_path_walk+0x615/0xa20
>  [<c0168a18>] __mark_inode_dirty+0x28/0x150
>  [<c01631c1>] mntput_no_expire+0x11/0x50
>  [<c01596b2>] link_path_walk+0x42/0xb0
>  [<c0159960>] do_path_lookup+0x130/0x150
>  [<c015a190>] __user_walk_fd+0x30/0x50
>  [<c0154766>] vfs_lstat_fd+0x16/0x40
>  [<c01547df>] sys_lstat64+0xf/0x30
>  [<c0103c42>] syscall_call+0x7/0xb
>  =======================

static int as_can_break_anticipation(struct as_data *ad, struct request *rq)
{
        struct io_context *ioc;
        struct as_io_context *aic;

        ioc = ad->io_context;  <======== ad is bogus
        BUG_ON(!ioc);


Call chain is:

	as_add_request
	as_update_rq:
	        if (ad->antic_status == ANTIC_WAIT_REQ
        	                || ad->antic_status == ANTIC_WAIT_NEXT) {
                	if (as_can_break_anticipation(ad, rq))
                        	as_antic_stop(ad);
        	}


So somehow 'ad' became invalid between the time ad->antic_status was
checked and as_can_break_anticipation() tried to access ad->io_context?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-16 16:31 ` Chuck Ebbert
@ 2007-07-16 17:29   ` Jens Axboe
  2007-07-16 19:49     ` Ian Kumlien
  2007-07-17  1:44   ` Rene Herman
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2007-07-16 17:29 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: pomac, Linux-kernel, Nick Piggin

On Mon, Jul 16 2007, Chuck Ebbert wrote:
> On 07/15/2007 11:20 AM, Ian Kumlien wrote:
> > I had emerge --sync failing several times... 
> > 
> > So i checked dmesg and found some info, attached further down.
> > This is a old VIA C3 machine with one disk, it's been running most
> > kernels in the 2.6.x series with no problems until now.
> > 
> > PS. Don't forget to CC me
> > DS.
> > 
> > BUG: unable to handle kernel paging request at virtual address ea86ac54
> >  printing eip:
> > c022dfec
> > *pde = 00000000
> > Oops: 0000 [#1]
> > Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
> > CPU:    0
> > EIP:    0060:[<c022dfec>]    Not tainted VLI
> > EFLAGS: 00010082   (2.6.22.1 #26)
> > EIP is at as_can_break_anticipation+0xc/0x190
> > eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
> > esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
> > ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> > Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
> > Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
> > 00000000 
> >        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
> > dfcffb9c 
> >        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
> > c04d1ec0 
> > Call Trace:
> >  [<c022efc8>] as_add_request+0xa8/0xc0
> >  [<c0227a76>] elv_insert+0xa6/0x150
> >  [<c016e96e>] bio_phys_segments+0xe/0x20
> >  [<c022af64>] __make_request+0x384/0x490
> >  [<c02add1e>] ide_do_request+0x6ee/0x890
> >  [<c02294ab>] generic_make_request+0x18b/0x1c0
> >  [<c022b596>] submit_bio+0xa6/0xb0
> >  [<c013b7b8>] mempool_alloc+0x28/0xa0
> >  [<c016bb66>] __find_get_block+0xf6/0x130
> >  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
> >  [<c016b647>] submit_bh+0xb7/0xe0
> >  [<c016c1f8>] ll_rw_block+0x78/0x90
> >  [<c019c85d>] search_by_key+0xdd/0xd20
> >  [<c016c201>] ll_rw_block+0x81/0x90
> >  [<c011f190>] irq_exit+0x40/0x60
> >  [<c01066e4>] do_IRQ+0x94/0xb0
> >  [<c0104bc3>] common_interrupt+0x23/0x30
> >  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
> >  [<c018e580>] reiserfs_find_actor+0x0/0x20
> >  [<c018c33b>] reiserfs_iget+0x4b/0x80
> >  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
> >  [<c0189824>] reiserfs_lookup+0xa4/0xf0
> >  [<c0157b03>] do_lookup+0xa3/0x140
> >  [<c0159265>] __link_path_walk+0x615/0xa20
> >  [<c0168a18>] __mark_inode_dirty+0x28/0x150
> >  [<c01631c1>] mntput_no_expire+0x11/0x50
> >  [<c01596b2>] link_path_walk+0x42/0xb0
> >  [<c0159960>] do_path_lookup+0x130/0x150
> >  [<c015a190>] __user_walk_fd+0x30/0x50
> >  [<c0154766>] vfs_lstat_fd+0x16/0x40
> >  [<c01547df>] sys_lstat64+0xf/0x30
> >  [<c0103c42>] syscall_call+0x7/0xb
> >  =======================
> 
> static int as_can_break_anticipation(struct as_data *ad, struct request *rq)
> {
>         struct io_context *ioc;
>         struct as_io_context *aic;
> 
>         ioc = ad->io_context;  <======== ad is bogus
>         BUG_ON(!ioc);
> 
> 
> Call chain is:
> 
> 	as_add_request
> 	as_update_rq:
> 	        if (ad->antic_status == ANTIC_WAIT_REQ
>         	                || ad->antic_status == ANTIC_WAIT_NEXT) {
>                 	if (as_can_break_anticipation(ad, rq))
>                         	as_antic_stop(ad);
>         	}
> 
> 
> So somehow 'ad' became invalid between the time ad->antic_status was
> checked and as_can_break_anticipation() tried to access ad->io_context?

That's impossible, ad is persistent unless the io scheduler is attempted
removed. Did you fiddle with switching io schedulers while this
happened? If not, then something corrupted your memory. And I'm not
aware of any io scheduler switching bugs, so the oops would still be
highly suspect if so.

You write emerge - are you using an experimental compiler? Or did you
recently change hardware? Is it warmer than usual?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-16 17:29   ` Jens Axboe
@ 2007-07-16 19:49     ` Ian Kumlien
  2007-07-16 19:56       ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Kumlien @ 2007-07-16 19:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Chuck Ebbert, Linux-kernel, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 5033 bytes --]

On mån, 2007-07-16 at 19:29 +0200, Jens Axboe wrote:
> On Mon, Jul 16 2007, Chuck Ebbert wrote:
> > On 07/15/2007 11:20 AM, Ian Kumlien wrote:
> > > I had emerge --sync failing several times... 
> > > 
> > > So i checked dmesg and found some info, attached further down.
> > > This is a old VIA C3 machine with one disk, it's been running most
> > > kernels in the 2.6.x series with no problems until now.
> > > 
> > > PS. Don't forget to CC me
> > > DS.
> > > 
> > > BUG: unable to handle kernel paging request at virtual address ea86ac54
> > >  printing eip:
> > > c022dfec
> > > *pde = 00000000
> > > Oops: 0000 [#1]
> > > Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
> > > CPU:    0
> > > EIP:    0060:[<c022dfec>]    Not tainted VLI
> > > EFLAGS: 00010082   (2.6.22.1 #26)
> > > EIP is at as_can_break_anticipation+0xc/0x190
> > > eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
> > > esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
> > > ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> > > Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
> > > Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
> > > 00000000 
> > >        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
> > > dfcffb9c 
> > >        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
> > > c04d1ec0 
> > > Call Trace:
> > >  [<c022efc8>] as_add_request+0xa8/0xc0
> > >  [<c0227a76>] elv_insert+0xa6/0x150
> > >  [<c016e96e>] bio_phys_segments+0xe/0x20
> > >  [<c022af64>] __make_request+0x384/0x490
> > >  [<c02add1e>] ide_do_request+0x6ee/0x890
> > >  [<c02294ab>] generic_make_request+0x18b/0x1c0
> > >  [<c022b596>] submit_bio+0xa6/0xb0
> > >  [<c013b7b8>] mempool_alloc+0x28/0xa0
> > >  [<c016bb66>] __find_get_block+0xf6/0x130
> > >  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
> > >  [<c016b647>] submit_bh+0xb7/0xe0
> > >  [<c016c1f8>] ll_rw_block+0x78/0x90
> > >  [<c019c85d>] search_by_key+0xdd/0xd20
> > >  [<c016c201>] ll_rw_block+0x81/0x90
> > >  [<c011f190>] irq_exit+0x40/0x60
> > >  [<c01066e4>] do_IRQ+0x94/0xb0
> > >  [<c0104bc3>] common_interrupt+0x23/0x30
> > >  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
> > >  [<c018e580>] reiserfs_find_actor+0x0/0x20
> > >  [<c018c33b>] reiserfs_iget+0x4b/0x80
> > >  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
> > >  [<c0189824>] reiserfs_lookup+0xa4/0xf0
> > >  [<c0157b03>] do_lookup+0xa3/0x140
> > >  [<c0159265>] __link_path_walk+0x615/0xa20
> > >  [<c0168a18>] __mark_inode_dirty+0x28/0x150
> > >  [<c01631c1>] mntput_no_expire+0x11/0x50
> > >  [<c01596b2>] link_path_walk+0x42/0xb0
> > >  [<c0159960>] do_path_lookup+0x130/0x150
> > >  [<c015a190>] __user_walk_fd+0x30/0x50
> > >  [<c0154766>] vfs_lstat_fd+0x16/0x40
> > >  [<c01547df>] sys_lstat64+0xf/0x30
> > >  [<c0103c42>] syscall_call+0x7/0xb
> > >  =======================
> > 
> > static int as_can_break_anticipation(struct as_data *ad, struct request *rq)
> > {
> >         struct io_context *ioc;
> >         struct as_io_context *aic;
> > 
> >         ioc = ad->io_context;  <======== ad is bogus
> >         BUG_ON(!ioc);
> > 
> > 
> > Call chain is:
> > 
> > 	as_add_request
> > 	as_update_rq:
> > 	        if (ad->antic_status == ANTIC_WAIT_REQ
> >         	                || ad->antic_status == ANTIC_WAIT_NEXT) {
> >                 	if (as_can_break_anticipation(ad, rq))
> >                         	as_antic_stop(ad);
> >         	}
> > 
> > 
> > So somehow 'ad' became invalid between the time ad->antic_status was
> > checked and as_can_break_anticipation() tried to access ad->io_context?
> 
> That's impossible, ad is persistent unless the io scheduler is attempted
> removed. Did you fiddle with switching io schedulers while this
> happened? If not, then something corrupted your memory. And I'm not
> aware of any io scheduler switching bugs, so the oops would still be
> highly suspect if so.

I wasn't fiddling with the scheduler, it's quite happily been running AS
for quite some time.

Actually it ran AS for the entire 2.6.21 and 2.6.20 life cycles.

> You write emerge - are you using an experimental compiler? Or did you
> recently change hardware? Is it warmer than usual?

No change in hardware, no change in compiler either.

gcc (GCC) 4.1.2 (Gentoo 4.1.2)

Which is the same compiler that compiled 2.6.21 afair.

It's been more humid, but not warmer... Were talking about a cpu that
usually idles at 17 deg =)

There might however have been more io load, right now it's 220
connections to my webserver and it's 5 days since my friend released the
mix they are downloading.

(I'm preparing to switch it all over to a core2duo with mirrored disks,
which is a wee bit more suited to this kind of load)

Sent data: 8062.27 Mb (4620.55 kbit/s)
Requests: 56284 (235/min)

Webserver uptime 9h

-- 
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-16 19:49     ` Ian Kumlien
@ 2007-07-16 19:56       ` Jens Axboe
  2007-07-16 20:14         ` Ian Kumlien
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2007-07-16 19:56 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Chuck Ebbert, Linux-kernel, Nick Piggin

On Mon, Jul 16 2007, Ian Kumlien wrote:
> On mån, 2007-07-16 at 19:29 +0200, Jens Axboe wrote:
> > On Mon, Jul 16 2007, Chuck Ebbert wrote:
> > > On 07/15/2007 11:20 AM, Ian Kumlien wrote:
> > > > I had emerge --sync failing several times... 
> > > > 
> > > > So i checked dmesg and found some info, attached further down.
> > > > This is a old VIA C3 machine with one disk, it's been running most
> > > > kernels in the 2.6.x series with no problems until now.
> > > > 
> > > > PS. Don't forget to CC me
> > > > DS.
> > > > 
> > > > BUG: unable to handle kernel paging request at virtual address ea86ac54
> > > >  printing eip:
> > > > c022dfec
> > > > *pde = 00000000
> > > > Oops: 0000 [#1]
> > > > Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
> > > > CPU:    0
> > > > EIP:    0060:[<c022dfec>]    Not tainted VLI
> > > > EFLAGS: 00010082   (2.6.22.1 #26)
> > > > EIP is at as_can_break_anticipation+0xc/0x190
> > > > eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
> > > > esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
> > > > ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> > > > Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
> > > > Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
> > > > 00000000 
> > > >        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
> > > > dfcffb9c 
> > > >        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
> > > > c04d1ec0 
> > > > Call Trace:
> > > >  [<c022efc8>] as_add_request+0xa8/0xc0
> > > >  [<c0227a76>] elv_insert+0xa6/0x150
> > > >  [<c016e96e>] bio_phys_segments+0xe/0x20
> > > >  [<c022af64>] __make_request+0x384/0x490
> > > >  [<c02add1e>] ide_do_request+0x6ee/0x890
> > > >  [<c02294ab>] generic_make_request+0x18b/0x1c0
> > > >  [<c022b596>] submit_bio+0xa6/0xb0
> > > >  [<c013b7b8>] mempool_alloc+0x28/0xa0
> > > >  [<c016bb66>] __find_get_block+0xf6/0x130
> > > >  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
> > > >  [<c016b647>] submit_bh+0xb7/0xe0
> > > >  [<c016c1f8>] ll_rw_block+0x78/0x90
> > > >  [<c019c85d>] search_by_key+0xdd/0xd20
> > > >  [<c016c201>] ll_rw_block+0x81/0x90
> > > >  [<c011f190>] irq_exit+0x40/0x60
> > > >  [<c01066e4>] do_IRQ+0x94/0xb0
> > > >  [<c0104bc3>] common_interrupt+0x23/0x30
> > > >  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
> > > >  [<c018e580>] reiserfs_find_actor+0x0/0x20
> > > >  [<c018c33b>] reiserfs_iget+0x4b/0x80
> > > >  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
> > > >  [<c0189824>] reiserfs_lookup+0xa4/0xf0
> > > >  [<c0157b03>] do_lookup+0xa3/0x140
> > > >  [<c0159265>] __link_path_walk+0x615/0xa20
> > > >  [<c0168a18>] __mark_inode_dirty+0x28/0x150
> > > >  [<c01631c1>] mntput_no_expire+0x11/0x50
> > > >  [<c01596b2>] link_path_walk+0x42/0xb0
> > > >  [<c0159960>] do_path_lookup+0x130/0x150
> > > >  [<c015a190>] __user_walk_fd+0x30/0x50
> > > >  [<c0154766>] vfs_lstat_fd+0x16/0x40
> > > >  [<c01547df>] sys_lstat64+0xf/0x30
> > > >  [<c0103c42>] syscall_call+0x7/0xb
> > > >  =======================
> > > 
> > > static int as_can_break_anticipation(struct as_data *ad, struct request *rq)
> > > {
> > >         struct io_context *ioc;
> > >         struct as_io_context *aic;
> > > 
> > >         ioc = ad->io_context;  <======== ad is bogus
> > >         BUG_ON(!ioc);
> > > 
> > > 
> > > Call chain is:
> > > 
> > > 	as_add_request
> > > 	as_update_rq:
> > > 	        if (ad->antic_status == ANTIC_WAIT_REQ
> > >         	                || ad->antic_status == ANTIC_WAIT_NEXT) {
> > >                 	if (as_can_break_anticipation(ad, rq))
> > >                         	as_antic_stop(ad);
> > >         	}
> > > 
> > > 
> > > So somehow 'ad' became invalid between the time ad->antic_status was
> > > checked and as_can_break_anticipation() tried to access ad->io_context?
> > 
> > That's impossible, ad is persistent unless the io scheduler is attempted
> > removed. Did you fiddle with switching io schedulers while this
> > happened? If not, then something corrupted your memory. And I'm not
> > aware of any io scheduler switching bugs, so the oops would still be
> > highly suspect if so.
> 
> I wasn't fiddling with the scheduler, it's quite happily been running AS
> for quite some time.

OK, that rules that out then. Then your oops looks very much like
hardware trouble. Perhaps a border liner PSU? Just an idea.

> Actually it ran AS for the entire 2.6.21 and 2.6.20 life cycles.

There's essentially only one change in AS between 2.6.21 and 2.6.22, and
that is converting a jiffes vs msec error. So no real code change.

> > You write emerge - are you using an experimental compiler? Or did you
> > recently change hardware? Is it warmer than usual?
> 
> No change in hardware, no change in compiler either.
> 
> gcc (GCC) 4.1.2 (Gentoo 4.1.2)
> 
> Which is the same compiler that compiled 2.6.21 afair.
> 
> It's been more humid, but not warmer... Were talking about a cpu that
> usually idles at 17 deg =)

Probably not the heat, then.

> There might however have been more io load, right now it's 220
> connections to my webserver and it's 5 days since my friend released the
> mix they are downloading.
> 
> (I'm preparing to switch it all over to a core2duo with mirrored disks,
> which is a wee bit more suited to this kind of load)
> 
> Sent data: 8062.27 Mb (4620.55 kbit/s)
> Requests: 56284 (235/min)
> 
> Webserver uptime 9h

My gut says that this is the hardware falling over, for whatever reason.
Since it's otherwise stable, it could be something marginal pushing it
over. Like your higher load (be it CPU, or disk).

You could try and boot with the noop IO scheduler and see if it still
oopses. Not sure would else to suggest, your box will likely pass
memtest just fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-16 19:56       ` Jens Axboe
@ 2007-07-16 20:14         ` Ian Kumlien
  2007-07-17  6:23           ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Kumlien @ 2007-07-16 20:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Chuck Ebbert, Linux-kernel, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 7146 bytes --]

On mån, 2007-07-16 at 21:56 +0200, Jens Axboe wrote:
> On Mon, Jul 16 2007, Ian Kumlien wrote:
> > On mån, 2007-07-16 at 19:29 +0200, Jens Axboe wrote:
> > > On Mon, Jul 16 2007, Chuck Ebbert wrote:
> > > > On 07/15/2007 11:20 AM, Ian Kumlien wrote:
> > > > > I had emerge --sync failing several times... 
> > > > > 
> > > > > So i checked dmesg and found some info, attached further down.
> > > > > This is a old VIA C3 machine with one disk, it's been running most
> > > > > kernels in the 2.6.x series with no problems until now.
> > > > > 
> > > > > PS. Don't forget to CC me
> > > > > DS.
> > > > > 
> > > > > BUG: unable to handle kernel paging request at virtual address ea86ac54
> > > > >  printing eip:
> > > > > c022dfec
> > > > > *pde = 00000000
> > > > > Oops: 0000 [#1]
> > > > > Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
> > > > > CPU:    0
> > > > > EIP:    0060:[<c022dfec>]    Not tainted VLI
> > > > > EFLAGS: 00010082   (2.6.22.1 #26)
> > > > > EIP is at as_can_break_anticipation+0xc/0x190
> > > > > eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
> > > > > esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
> > > > > ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> > > > > Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
> > > > > Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
> > > > > 00000000 
> > > > >        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
> > > > > dfcffb9c 
> > > > >        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
> > > > > c04d1ec0 
> > > > > Call Trace:
> > > > >  [<c022efc8>] as_add_request+0xa8/0xc0
> > > > >  [<c0227a76>] elv_insert+0xa6/0x150
> > > > >  [<c016e96e>] bio_phys_segments+0xe/0x20
> > > > >  [<c022af64>] __make_request+0x384/0x490
> > > > >  [<c02add1e>] ide_do_request+0x6ee/0x890
> > > > >  [<c02294ab>] generic_make_request+0x18b/0x1c0
> > > > >  [<c022b596>] submit_bio+0xa6/0xb0
> > > > >  [<c013b7b8>] mempool_alloc+0x28/0xa0
> > > > >  [<c016bb66>] __find_get_block+0xf6/0x130
> > > > >  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
> > > > >  [<c016b647>] submit_bh+0xb7/0xe0
> > > > >  [<c016c1f8>] ll_rw_block+0x78/0x90
> > > > >  [<c019c85d>] search_by_key+0xdd/0xd20
> > > > >  [<c016c201>] ll_rw_block+0x81/0x90
> > > > >  [<c011f190>] irq_exit+0x40/0x60
> > > > >  [<c01066e4>] do_IRQ+0x94/0xb0
> > > > >  [<c0104bc3>] common_interrupt+0x23/0x30
> > > > >  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
> > > > >  [<c018e580>] reiserfs_find_actor+0x0/0x20
> > > > >  [<c018c33b>] reiserfs_iget+0x4b/0x80
> > > > >  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
> > > > >  [<c0189824>] reiserfs_lookup+0xa4/0xf0
> > > > >  [<c0157b03>] do_lookup+0xa3/0x140
> > > > >  [<c0159265>] __link_path_walk+0x615/0xa20
> > > > >  [<c0168a18>] __mark_inode_dirty+0x28/0x150
> > > > >  [<c01631c1>] mntput_no_expire+0x11/0x50
> > > > >  [<c01596b2>] link_path_walk+0x42/0xb0
> > > > >  [<c0159960>] do_path_lookup+0x130/0x150
> > > > >  [<c015a190>] __user_walk_fd+0x30/0x50
> > > > >  [<c0154766>] vfs_lstat_fd+0x16/0x40
> > > > >  [<c01547df>] sys_lstat64+0xf/0x30
> > > > >  [<c0103c42>] syscall_call+0x7/0xb
> > > > >  =======================
> > > > 
> > > > static int as_can_break_anticipation(struct as_data *ad, struct request *rq)
> > > > {
> > > >         struct io_context *ioc;
> > > >         struct as_io_context *aic;
> > > > 
> > > >         ioc = ad->io_context;  <======== ad is bogus
> > > >         BUG_ON(!ioc);
> > > > 
> > > > 
> > > > Call chain is:
> > > > 
> > > > 	as_add_request
> > > > 	as_update_rq:
> > > > 	        if (ad->antic_status == ANTIC_WAIT_REQ
> > > >         	                || ad->antic_status == ANTIC_WAIT_NEXT) {
> > > >                 	if (as_can_break_anticipation(ad, rq))
> > > >                         	as_antic_stop(ad);
> > > >         	}
> > > > 
> > > > 
> > > > So somehow 'ad' became invalid between the time ad->antic_status was
> > > > checked and as_can_break_anticipation() tried to access ad->io_context?
> > > 
> > > That's impossible, ad is persistent unless the io scheduler is attempted
> > > removed. Did you fiddle with switching io schedulers while this
> > > happened? If not, then something corrupted your memory. And I'm not
> > > aware of any io scheduler switching bugs, so the oops would still be
> > > highly suspect if so.
> > 
> > I wasn't fiddling with the scheduler, it's quite happily been running AS
> > for quite some time.
> 
> OK, that rules that out then. Then your oops looks very much like
> hardware trouble. Perhaps a border liner PSU? Just an idea.

It uses a laptop psu, that doesn't need cooling, this is a microitx
board =)

> > Actually it ran AS for the entire 2.6.21 and 2.6.20 life cycles.
> 
> There's essentially only one change in AS between 2.6.21 and 2.6.22, and
> that is converting a jiffes vs msec error. So no real code change.

Hummm, odd... 

> > > You write emerge - are you using an experimental compiler? Or did you
> > > recently change hardware? Is it warmer than usual?
> > 
> > No change in hardware, no change in compiler either.
> > 
> > gcc (GCC) 4.1.2 (Gentoo 4.1.2)
> > 
> > Which is the same compiler that compiled 2.6.21 afair.
> > 
> > It's been more humid, but not warmer... Were talking about a cpu that
> > usually idles at 17 deg =)
> 
> Probably not the heat, then.

Currenlty:
CPU Temp:  +25.6°C

This is all passively cooled btw, it was lower a while ago, bit the
webserver is still loaded.

> > There might however have been more io load, right now it's 220
> > connections to my webserver and it's 5 days since my friend released the
> > mix they are downloading.
> > 
> > (I'm preparing to switch it all over to a core2duo with mirrored disks,
> > which is a wee bit more suited to this kind of load)
> > 
> > Sent data: 8062.27 Mb (4620.55 kbit/s)
> > Requests: 56284 (235/min)
> > 
> > Webserver uptime 9h
> 
> My gut says that this is the hardware falling over, for whatever reason.
> Since it's otherwise stable, it could be something marginal pushing it
> over. Like your higher load (be it CPU, or disk).

It's only ever happened once.
(But it blocked any and all rsyncs... Common io worked)

This kinds of loads happens now and then, first it's slashdot finding
the icculus.org mirror of postal2 (long time ago, i know) and then it's
someone releasing a new mix... I always find out by noticing the amount
of webconnections increase.

> You could try and boot with the noop IO scheduler and see if it still
> oopses. Not sure would else to suggest, your box will likely pass
> memtest just fine.

It's currently running with cfq since ~2 days without a problem.

I really can't take it down and do a memtest on it, it's my mailserver,
webserver, firewall etc etc =)

Just let me know what kind of information you might want and i'll put it
all up... =)

-- 
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-16 20:14         ` Ian Kumlien
@ 2007-07-17  6:23           ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2007-07-17  6:23 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Chuck Ebbert, Linux-kernel, Nick Piggin

On Mon, Jul 16 2007, Ian Kumlien wrote:
> On mån, 2007-07-16 at 21:56 +0200, Jens Axboe wrote:
> > On Mon, Jul 16 2007, Ian Kumlien wrote:
> > > On mån, 2007-07-16 at 19:29 +0200, Jens Axboe wrote:
> > > > On Mon, Jul 16 2007, Chuck Ebbert wrote:
> > > > > On 07/15/2007 11:20 AM, Ian Kumlien wrote:
> > > > > > I had emerge --sync failing several times... 
> > > > > > 
> > > > > > So i checked dmesg and found some info, attached further down.
> > > > > > This is a old VIA C3 machine with one disk, it's been running most
> > > > > > kernels in the 2.6.x series with no problems until now.
> > > > > > 
> > > > > > PS. Don't forget to CC me
> > > > > > DS.
> > > > > > 
> > > > > > BUG: unable to handle kernel paging request at virtual address ea86ac54
> > > > > >  printing eip:
> > > > > > c022dfec
> > > > > > *pde = 00000000
> > > > > > Oops: 0000 [#1]
> > > > > > Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
> > > > > > CPU:    0
> > > > > > EIP:    0060:[<c022dfec>]    Not tainted VLI
> > > > > > EFLAGS: 00010082   (2.6.22.1 #26)
> > > > > > EIP is at as_can_break_anticipation+0xc/0x190
> > > > > > eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
> > > > > > esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
> > > > > > ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> > > > > > Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
> > > > > > Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
> > > > > > 00000000 
> > > > > >        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
> > > > > > dfcffb9c 
> > > > > >        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
> > > > > > c04d1ec0 
> > > > > > Call Trace:
> > > > > >  [<c022efc8>] as_add_request+0xa8/0xc0
> > > > > >  [<c0227a76>] elv_insert+0xa6/0x150
> > > > > >  [<c016e96e>] bio_phys_segments+0xe/0x20
> > > > > >  [<c022af64>] __make_request+0x384/0x490
> > > > > >  [<c02add1e>] ide_do_request+0x6ee/0x890
> > > > > >  [<c02294ab>] generic_make_request+0x18b/0x1c0
> > > > > >  [<c022b596>] submit_bio+0xa6/0xb0
> > > > > >  [<c013b7b8>] mempool_alloc+0x28/0xa0
> > > > > >  [<c016bb66>] __find_get_block+0xf6/0x130
> > > > > >  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
> > > > > >  [<c016b647>] submit_bh+0xb7/0xe0
> > > > > >  [<c016c1f8>] ll_rw_block+0x78/0x90
> > > > > >  [<c019c85d>] search_by_key+0xdd/0xd20
> > > > > >  [<c016c201>] ll_rw_block+0x81/0x90
> > > > > >  [<c011f190>] irq_exit+0x40/0x60
> > > > > >  [<c01066e4>] do_IRQ+0x94/0xb0
> > > > > >  [<c0104bc3>] common_interrupt+0x23/0x30
> > > > > >  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
> > > > > >  [<c018e580>] reiserfs_find_actor+0x0/0x20
> > > > > >  [<c018c33b>] reiserfs_iget+0x4b/0x80
> > > > > >  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
> > > > > >  [<c0189824>] reiserfs_lookup+0xa4/0xf0
> > > > > >  [<c0157b03>] do_lookup+0xa3/0x140
> > > > > >  [<c0159265>] __link_path_walk+0x615/0xa20
> > > > > >  [<c0168a18>] __mark_inode_dirty+0x28/0x150
> > > > > >  [<c01631c1>] mntput_no_expire+0x11/0x50
> > > > > >  [<c01596b2>] link_path_walk+0x42/0xb0
> > > > > >  [<c0159960>] do_path_lookup+0x130/0x150
> > > > > >  [<c015a190>] __user_walk_fd+0x30/0x50
> > > > > >  [<c0154766>] vfs_lstat_fd+0x16/0x40
> > > > > >  [<c01547df>] sys_lstat64+0xf/0x30
> > > > > >  [<c0103c42>] syscall_call+0x7/0xb
> > > > > >  =======================
> > > > > 
> > > > > static int as_can_break_anticipation(struct as_data *ad, struct request *rq)
> > > > > {
> > > > >         struct io_context *ioc;
> > > > >         struct as_io_context *aic;
> > > > > 
> > > > >         ioc = ad->io_context;  <======== ad is bogus
> > > > >         BUG_ON(!ioc);
> > > > > 
> > > > > 
> > > > > Call chain is:
> > > > > 
> > > > > 	as_add_request
> > > > > 	as_update_rq:
> > > > > 	        if (ad->antic_status == ANTIC_WAIT_REQ
> > > > >         	                || ad->antic_status == ANTIC_WAIT_NEXT) {
> > > > >                 	if (as_can_break_anticipation(ad, rq))
> > > > >                         	as_antic_stop(ad);
> > > > >         	}
> > > > > 
> > > > > 
> > > > > So somehow 'ad' became invalid between the time ad->antic_status was
> > > > > checked and as_can_break_anticipation() tried to access ad->io_context?
> > > > 
> > > > That's impossible, ad is persistent unless the io scheduler is attempted
> > > > removed. Did you fiddle with switching io schedulers while this
> > > > happened? If not, then something corrupted your memory. And I'm not
> > > > aware of any io scheduler switching bugs, so the oops would still be
> > > > highly suspect if so.
> > > 
> > > I wasn't fiddling with the scheduler, it's quite happily been running AS
> > > for quite some time.
> > 
> > OK, that rules that out then. Then your oops looks very much like
> > hardware trouble. Perhaps a border liner PSU? Just an idea.
> 
> It uses a laptop psu, that doesn't need cooling, this is a microitx
> board =)

Yeah I know, I've had the same setup for a "server" at some point in the
past. It wasn't very stable for me under load, but that doesn't mean
it's a general problem of course :-)

> > You could try and boot with the noop IO scheduler and see if it still
> > oopses. Not sure would else to suggest, your box will likely pass
> > memtest just fine.
> 
> It's currently running with cfq since ~2 days without a problem.
> 
> I really can't take it down and do a memtest on it, it's my mailserver,
> webserver, firewall etc etc =)

And you shouldn't, as I wrote I don't think that memtest would uncover
anything.

> Just let me know what kind of information you might want and i'll put it
> all up... =)

Lets see if it remains stable with CFQ, I have no further ideas right
now. The oops is impossible.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-16 16:31 ` Chuck Ebbert
  2007-07-16 17:29   ` Jens Axboe
@ 2007-07-17  1:44   ` Rene Herman
  2007-07-17  6:24     ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Rene Herman @ 2007-07-17  1:44 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: pomac, Linux-kernel, Jens Axboe, Nick Piggin

On 07/16/2007 06:31 PM, Chuck Ebbert wrote:

>> BUG: unable to handle kernel paging request at virtual address ea86ac54
>>  printing eip:
>> c022dfec
>> *pde = 00000000
>> Oops: 0000 [#1]
>> Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
>> CPU:    0
>> EIP:    0060:[<c022dfec>]    Not tainted VLI
>> EFLAGS: 00010082   (2.6.22.1 #26)
>> EIP is at as_can_break_anticipation+0xc/0x190
>> eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
>> esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
>> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
>> Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
>> Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
>> 00000000 
>>        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
>> dfcffb9c 
>>        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
>> c04d1ec0 
>> Call Trace:
>>  [<c022efc8>] as_add_request+0xa8/0xc0
>>  [<c0227a76>] elv_insert+0xa6/0x150
>>  [<c016e96e>] bio_phys_segments+0xe/0x20
>>  [<c022af64>] __make_request+0x384/0x490
>>  [<c02add1e>] ide_do_request+0x6ee/0x890
>>  [<c02294ab>] generic_make_request+0x18b/0x1c0
>>  [<c022b596>] submit_bio+0xa6/0xb0
>>  [<c013b7b8>] mempool_alloc+0x28/0xa0
>>  [<c016bb66>] __find_get_block+0xf6/0x130
>>  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
>>  [<c016b647>] submit_bh+0xb7/0xe0
>>  [<c016c1f8>] ll_rw_block+0x78/0x90
>>  [<c019c85d>] search_by_key+0xdd/0xd20
>>  [<c016c201>] ll_rw_block+0x81/0x90
>>  [<c011f190>] irq_exit+0x40/0x60
>>  [<c01066e4>] do_IRQ+0x94/0xb0
>>  [<c0104bc3>] common_interrupt+0x23/0x30
>>  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
>>  [<c018e580>] reiserfs_find_actor+0x0/0x20
>>  [<c018c33b>] reiserfs_iget+0x4b/0x80
>>  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
>>  [<c0189824>] reiserfs_lookup+0xa4/0xf0
>>  [<c0157b03>] do_lookup+0xa3/0x140
>>  [<c0159265>] __link_path_walk+0x615/0xa20
>>  [<c0168a18>] __mark_inode_dirty+0x28/0x150
>>  [<c01631c1>] mntput_no_expire+0x11/0x50
>>  [<c01596b2>] link_path_walk+0x42/0xb0
>>  [<c0159960>] do_path_lookup+0x130/0x150
>>  [<c015a190>] __user_walk_fd+0x30/0x50
>>  [<c0154766>] vfs_lstat_fd+0x16/0x40
>>  [<c01547df>] sys_lstat64+0xf/0x30
>>  [<c0103c42>] syscall_call+0x7/0xb
>>  =======================
> 
> static int as_can_break_anticipation(struct as_data *ad, struct request *rq)
> {
>         struct io_context *ioc;
>         struct as_io_context *aic;
> 
>         ioc = ad->io_context;  <======== ad is bogus
>         BUG_ON(!ioc);
> 
> 
> Call chain is:
> 
> 	as_add_request
> 	as_update_rq:
> 	        if (ad->antic_status == ANTIC_WAIT_REQ
>         	                || ad->antic_status == ANTIC_WAIT_NEXT) {
>                 	if (as_can_break_anticipation(ad, rq))
>                         	as_antic_stop(ad);
>         	}
> 
> 
> So somehow 'ad' became invalid between the time ad->antic_status was
> checked and as_can_break_anticipation() tried to access ad->io_context?

Is this similar to:

http://lkml.org/lkml/2007/6/4/50

?

Rene.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-17  1:44   ` Rene Herman
@ 2007-07-17  6:24     ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2007-07-17  6:24 UTC (permalink / raw)
  To: Rene Herman; +Cc: Chuck Ebbert, pomac, Linux-kernel, Nick Piggin

On Tue, Jul 17 2007, Rene Herman wrote:
> On 07/16/2007 06:31 PM, Chuck Ebbert wrote:
>
>>> BUG: unable to handle kernel paging request at virtual address ea86ac54
>>>  printing eip:
>>> c022dfec
>>> *pde = 00000000
>>> Oops: 0000 [#1]
>>> Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
>>> CPU:    0
>>> EIP:    0060:[<c022dfec>]    Not tainted VLI
>>> EFLAGS: 00010082   (2.6.22.1 #26)
>>> EIP is at as_can_break_anticipation+0xc/0x190
>>> eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
>>> esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
>>> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
>>> Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
>>> Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
>>> 00000000        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 
>>> 00000000
>>> dfcffb9c        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 
>>> 08ff6b30
>>> c04d1ec0 Call Trace:
>>>  [<c022efc8>] as_add_request+0xa8/0xc0
>>>  [<c0227a76>] elv_insert+0xa6/0x150
>>>  [<c016e96e>] bio_phys_segments+0xe/0x20
>>>  [<c022af64>] __make_request+0x384/0x490
>>>  [<c02add1e>] ide_do_request+0x6ee/0x890
>>>  [<c02294ab>] generic_make_request+0x18b/0x1c0
>>>  [<c022b596>] submit_bio+0xa6/0xb0
>>>  [<c013b7b8>] mempool_alloc+0x28/0xa0
>>>  [<c016bb66>] __find_get_block+0xf6/0x130
>>>  [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
>>>  [<c016b647>] submit_bh+0xb7/0xe0
>>>  [<c016c1f8>] ll_rw_block+0x78/0x90
>>>  [<c019c85d>] search_by_key+0xdd/0xd20
>>>  [<c016c201>] ll_rw_block+0x81/0x90
>>>  [<c011f190>] irq_exit+0x40/0x60
>>>  [<c01066e4>] do_IRQ+0x94/0xb0
>>>  [<c0104bc3>] common_interrupt+0x23/0x30
>>>  [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
>>>  [<c018e580>] reiserfs_find_actor+0x0/0x20
>>>  [<c018c33b>] reiserfs_iget+0x4b/0x80
>>>  [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
>>>  [<c0189824>] reiserfs_lookup+0xa4/0xf0
>>>  [<c0157b03>] do_lookup+0xa3/0x140
>>>  [<c0159265>] __link_path_walk+0x615/0xa20
>>>  [<c0168a18>] __mark_inode_dirty+0x28/0x150
>>>  [<c01631c1>] mntput_no_expire+0x11/0x50
>>>  [<c01596b2>] link_path_walk+0x42/0xb0
>>>  [<c0159960>] do_path_lookup+0x130/0x150
>>>  [<c015a190>] __user_walk_fd+0x30/0x50
>>>  [<c0154766>] vfs_lstat_fd+0x16/0x40
>>>  [<c01547df>] sys_lstat64+0xf/0x30
>>>  [<c0103c42>] syscall_call+0x7/0xb
>>>  =======================
>> static int as_can_break_anticipation(struct as_data *ad, struct request 
>> *rq)
>> {
>>         struct io_context *ioc;
>>         struct as_io_context *aic;
>>         ioc = ad->io_context;  <======== ad is bogus
>>         BUG_ON(!ioc);
>> Call chain is:
>> 	as_add_request
>> 	as_update_rq:
>> 	        if (ad->antic_status == ANTIC_WAIT_REQ
>>         	                || ad->antic_status == ANTIC_WAIT_NEXT) {
>>                 	if (as_can_break_anticipation(ad, rq))
>>                         	as_antic_stop(ad);
>>         	}
>> So somehow 'ad' became invalid between the time ad->antic_status was
>> checked and as_can_break_anticipation() tried to access ad->io_context?
>
> Is this similar to:
>
> http://lkml.org/lkml/2007/6/4/50

Nope

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [BUG] AS io-scheduler.
@ 2007-07-14 15:39 Ian Kumlien
  2007-07-17  6:54 ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Kumlien @ 2007-07-14 15:39 UTC (permalink / raw)
  To: Linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4439 bytes --]

Hi, 

I had emerge --sync failing several times... 

So i checked dmesg and found some info, attached further down.
This is a old VIA C3 machine with one disk, it's been running most
kernels in the 2.6.x series with no problems until now.

PS. Don't forget to CC me
DS.

BUG: unable to handle kernel paging request at virtual address ea86ac54
 printing eip:
c022dfec
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
CPU:    0
EIP:    0060:[<c022dfec>]    Not tainted VLI
EFLAGS: 00010082   (2.6.22.1 #26)
EIP is at as_can_break_anticipation+0xc/0x190
eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
00000000 
       dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
dfcffb9c 
       00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
c04d1ec0 
Call Trace:
 [<c022efc8>] as_add_request+0xa8/0xc0
 [<c0227a76>] elv_insert+0xa6/0x150
 [<c016e96e>] bio_phys_segments+0xe/0x20
 [<c022af64>] __make_request+0x384/0x490
 [<c02add1e>] ide_do_request+0x6ee/0x890
 [<c02294ab>] generic_make_request+0x18b/0x1c0
 [<c022b596>] submit_bio+0xa6/0xb0
 [<c013b7b8>] mempool_alloc+0x28/0xa0
 [<c016bb66>] __find_get_block+0xf6/0x130
 [<c016e0bc>] bio_alloc_bioset+0x8c/0xf0
 [<c016b647>] submit_bh+0xb7/0xe0
 [<c016c1f8>] ll_rw_block+0x78/0x90
 [<c019c85d>] search_by_key+0xdd/0xd20
 [<c016c201>] ll_rw_block+0x81/0x90
 [<c011f190>] irq_exit+0x40/0x60
 [<c01066e4>] do_IRQ+0x94/0xb0
 [<c0104bc3>] common_interrupt+0x23/0x30
 [<c018beca>] reiserfs_read_locked_inode+0x6a/0x490
 [<c018e580>] reiserfs_find_actor+0x0/0x20
 [<c018c33b>] reiserfs_iget+0x4b/0x80
 [<c018e570>] reiserfs_init_locked_inode+0x0/0x10
 [<c0189824>] reiserfs_lookup+0xa4/0xf0
 [<c0157b03>] do_lookup+0xa3/0x140
 [<c0159265>] __link_path_walk+0x615/0xa20
 [<c0168a18>] __mark_inode_dirty+0x28/0x150
 [<c01631c1>] mntput_no_expire+0x11/0x50
 [<c01596b2>] link_path_walk+0x42/0xb0
 [<c0159960>] do_path_lookup+0x130/0x150
 [<c015a190>] __user_walk_fd+0x30/0x50
 [<c0154766>] vfs_lstat_fd+0x16/0x40
 [<c01547df>] sys_lstat64+0xf/0x30
 [<c0103c42>] syscall_call+0x7/0xb
 =======================
Code: c0 8b 44 cb 0c 8b 40 08 29 d0 f7 d0 c1 e8 1f 83 f0 01 eb 02 31 c0
5b c3 8d b4 26 00 00 00 00 55 57 56 53 83 ec 04 89 c3 89 14 24 <8b> 90
b4 00 00 00 85 d2 75 04 0f 0b eb fe 83 3c 24 00 74 0c 8b 
EIP: [<c022dfec>] as_can_break_anticipation+0xc/0x190 SS:ESP
0068:ceff6a70
WARNING: at block/as-iosched.c:862 as_remove_queued_request()
 [<c022e560>] as_remove_queued_request+0x40/0xc0
 [<c022e684>] as_move_to_dispatch+0xa4/0x110
 [<c0236626>] __delay+0x6/0x10
 [<c022ee03>] as_dispatch_request+0x2d3/0x310
 [<c02b39b2>] ide_dma_start+0x22/0x30
 [<c02b6be0>] ide_do_rw_disk+0x310/0x3f0
 [<c02279b5>] elv_next_request+0x105/0x120
 [<c02ad876>] ide_do_request+0x246/0x890
 [<c03d5c9d>] schedule+0x45d/0x4c0
 [<c022e7f0>] as_work_handler+0x0/0x20
 [<c022a331>] blk_start_queueing+0x11/0x20
 [<c022e7ff>] as_work_handler+0xf/0x20
 [<c0126bab>] run_workqueue+0x6b/0xe0
 [<c0127200>] worker_thread+0x0/0xc0
 [<c01272b2>] worker_thread+0xb2/0xc0
 [<c01297e0>] autoremove_wake_function+0x0/0x40
 [<c0129658>] kthread+0x38/0x60
 [<c0129620>] kthread+0x0/0x60
 [<c0104d87>] kernel_thread_helper+0x7/0x10
 =======================
WARNING: at block/as-iosched.c:966 as_move_to_dispatch()
 [<c022e6b3>] as_move_to_dispatch+0xd3/0x110
 [<c022ee03>] as_dispatch_request+0x2d3/0x310
 [<c02b39b2>] ide_dma_start+0x22/0x30
 [<c02b6be0>] ide_do_rw_disk+0x310/0x3f0
 [<c02279b5>] elv_next_request+0x105/0x120
 [<c02ad876>] ide_do_request+0x246/0x890
 [<c03d5c9d>] schedule+0x45d/0x4c0
 [<c022e7f0>] as_work_handler+0x0/0x20
 [<c022a331>] blk_start_queueing+0x11/0x20
 [<c022e7ff>] as_work_handler+0xf/0x20
 [<c0126bab>] run_workqueue+0x6b/0xe0
 [<c0127200>] worker_thread+0x0/0xc0
 [<c01272b2>] worker_thread+0xb2/0xc0
 [<c01297e0>] autoremove_wake_function+0x0/0x40
 [<c0129658>] kthread+0x38/0x60
 [<c0129620>] kthread+0x0/0x60
 [<c0104d87>] kernel_thread_helper+0x7/0x10
 =======================

-- 
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] AS io-scheduler.
  2007-07-14 15:39 Ian Kumlien
@ 2007-07-17  6:54 ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2007-07-17  6:54 UTC (permalink / raw)
  To: pomac; +Cc: Linux-kernel

On Saturday July 14, pomac@vapor.com wrote:
> Hi, 
> 
> I had emerge --sync failing several times... 
> 
> So i checked dmesg and found some info, attached further down.
> This is a old VIA C3 machine with one disk, it's been running most
> kernels in the 2.6.x series with no problems until now.
> 
> PS. Don't forget to CC me
> DS.
> 
> BUG: unable to handle kernel paging request at virtual address ea86ac54
                                                                 ^^^^^^^^
>  printing eip:
> c022dfec
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: eeprom i2c_viapro vt8231 i2c_isa skge
> CPU:    0
> EIP:    0060:[<c022dfec>]    Not tainted VLI
> EFLAGS: 00010082   (2.6.22.1 #26)
> EIP is at as_can_break_anticipation+0xc/0x190
> eax: dfcdaba0   ebx: dfcdaba0   ecx: 0035ff95   edx: cb296844
       ^^^^^^^^
> esi: cb296844   edi: dfcdaba0   ebp: 00000000   esp: ceff6a70
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process rsync (pid: 1667, ti=ceff6000 task=d59cf5b0 task.ti=ceff6000)
> Stack: cb296844 00000001 cb296844 dfcdaba0 00000000 c022efc8 cb296844
> 00000000 
>        dfcffb9c c0227a76 dfcffb9c 00000000 c016e96e cb296844 00000000
> dfcffb9c 
>        00000000 c022af64 00000000 dfcffb9c 00000008 00000000 08ff6b30
> c04d1ec0 
...
> Code: c0 8b 44 cb 0c 8b 40 08 29 d0 f7 d0 c1 e8 1f 83 f0 01 eb 02 31 c0
> 5b c3 8d b4 26 00 00 00 00 55 57 56 53 83 ec 04 89 c3 89 14 24 <8b> 90
> b4 00 00 00 85 d2 75 04 0f 0b eb fe 83 3c 24 00 74 0c 8b 

Plug that 'Code:' line into ksymoops and you get 
Code;  fffffffb <__kernel_rt_sigreturn+1bbb/????>
  26:   89 c3                     mov    %eax,%ebx
Code;  fffffffd <__kernel_rt_sigreturn+1bbd/????>
  28:   89 14 24                  mov    %edx,(%esp)
Code;  00000000 Before first symbol
  2b:   8b 90 b4 00 00 00         mov    0xb4(%eax),%edx
Code;  00000006 Before first symbol
  31:   85 d2                     test   %edx,%edx
Code;  00000008 Before first symbol
  33:   75 04                     jne    39 <_EIP+0x39>

(etc.).

The "Code;    00000000..." is the current EIP.
So the operation it was performing was
       mov 0xb4(%eax),%edx

Now I am no Pentium, but when I add 0xb4 to dfcdaba0 I don't get
ea86ac54.
My result is more like
dfcdac54.

So it looks like your Pentium (or whatever) got it half-right.
i.e. the bottom 16 bits are right, but the top are wrong by 0ab9.

I'd be taking a very serious look at your hardware.  This is not a
software problem.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-07-17  6:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-15 15:20 [BUG] AS io-scheduler Ian Kumlien
2007-07-16 16:31 ` Chuck Ebbert
2007-07-16 17:29   ` Jens Axboe
2007-07-16 19:49     ` Ian Kumlien
2007-07-16 19:56       ` Jens Axboe
2007-07-16 20:14         ` Ian Kumlien
2007-07-17  6:23           ` Jens Axboe
2007-07-17  1:44   ` Rene Herman
2007-07-17  6:24     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2007-07-14 15:39 Ian Kumlien
2007-07-17  6:54 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox