All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.4.17 SMP hangs ..
@ 2002-11-21  5:28 Manish Lachwani
  2002-11-21  6:03 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Manish Lachwani @ 2002-11-21  5:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Manish Lachwani

I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
drives in parallel. However, the hangs only occur when the I/O rate from
vmstat is high:


bash# vmstat 1
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 1  0  0      0 815800   3656 102612   0   0   163  1086  129   961   6  29
65
 0  0  0      0 813096   3656 102992   0   0   363     0  265   928  44  26
31
 0  0  0      0 813172   3656 102996   0   0     4     0  180    99   0   1
99
 5  2  1      0 809476   3656 103044   0   0    44     0  206   274  18  11
71
 0 12  0      0 802040   3656 103072   0   0    18     2  296   745  47  33
21
 0  8  0      0 800696   3660 103152   0   0    34   244  349   501  10   5
85
 0  8  0      0 800696   3660 103152   0   0     0     0  187   106   1   0
99
 2  5  1      0 795864   3660 103200   0   0    13    45  278   717  14  29
57
 1  0  0      0 795592   3660 103248   0   0   107    46  502   663   7   7
86
 1  0  0      0 795592   3660 103248   0   0     0     0  184    95   0   1
99
 1  0  0      0 795596   3660 103244   0   0     4     1  191   139   0   1
99
 6  5  3      0 756932   3660 140192   0   0   194 36721  464   232   7  87
6
11  0  1      0 723276   3660 173784   0   0     0 33718  681  1560   0  39
61

At that point the system hangs. The system consists of a 4-port and a 8-port
3ware controllers on an Intel 21154 bridge with 12 maxtor drives. When the
IO rate is lower:

   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 0  0  0      0 811888   3828 103476   0   0     0     0  172    91   0   1
99
 0  0  0      0 811888   3828 103476   0   0     0     0  184    93   0   1
99
 0  0  0      0 811888   3828 103476   0   0     0     0  171   153   0   1
99
 0  0  0      0 811852   3828 103512   0   0     0   170  171    85   0   1
99
 0  0  0      0 811852   3828 103512   0   0     0     0  174    95   1   0
99
 0  0  0      0 811852   3828 103512   0   0     0     0  173   147   0   1
99
 0  0  0      0 811852   3828 103512   0   0     0     0  176    98   0   1
99
 0  0  0      0 811852   3828 103512   0   0     0     1  175    96   0   1
99
 1  0  0      0 811852   3828 103512   0   0     0     0  173   110   1   3
96
 1  0  0      0 811704   3828 103516   0   0     0     0  179   145   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  194   121   1   1
98
 0  0  0      0 811688   3828 103516   0   0     0    19  186   119   1   1
98
 0  0  0      0 811688   3828 103516   0   0     0     0  174   149   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  172    86   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  175    96   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     1  171   149   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  173    91   0   1
99
 0  0  1      0 811688   3828 103516   0   0     0     0  173    86   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  179   154   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     1  174    91   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  179    98   1   0
99
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 1  0  0      0 811688   3828 103516   0   0     0     0  174   100   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  184   137   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     1  171    89   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  173    95   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  176   150   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  175   121   3   0
97
 0  0  0      0 811688   3828 103516   0   0     0    15  171   116   3   0
97
 0  0  0      0 811688   3828 103516   0   0     0     0  179   149   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  173    88   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  171    88   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0    15  172   149   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  171    88   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  174    98   0   1
99
 1  0  0      0 811688   3828 103516   0   0     0     0  178   127   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     1  179   153   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  171    88   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  173    88   0   1
99
 0  0  0      0 811688   3828 103516   0   0     0     0  175   158   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     1  175    96   1   0
99
 0  0  0      0 811688   3828 103516   0   0     0     0  171    90   0   1
99
 0  0  0      0 811924   3828 103516   0   0     0     0  173   204   1   5
94
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 1  0  0      0 811924   3828 103516   0   0     0     0  174    90   1   0
99
 0  0  0      0 811924   3828 103516   0   0     0    24  182    93   1   0
99
 1  0  0      0 811924   3828 103516   0   0     0     0  173   142   0   1
99
 0  0  0      0 811924   3828 103516   0   0     0     0  171    93   1   0
99
 0  0  0      0 811924   3828 103516   0   0     0     0  175    94   0   1
99
 1  0  0      0 811924   3828 103516   0   0     0     2  173    92   1   0
99
 0  0  0      0 811924   3828 103516   0   0     0     0  175   151   1   0
99
 0  0  0      0 811924   3828 103516   0   0     0     0  173    87   1   0
99
 0  0  1      0 811924   3828 103516   0   0     0     0  173    89   1   0
99
 0  0  0      0 811924   3828 103516   0   0     0     1  171   142   0   1
99

bash# vmstat 1
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 5  0  1      0 815896   3656 102228   0   0   156  1076  129   917   6  34
60
 2  0  1      0 812860   3656 102992   0   0   737     0  252  1015  36  40
25
 0  0  0      0 813104   3656 102992   0   0     0     0  187   129   2   1
97
 0  0  0      0 813180   3656 102996   0   0     4     0  168    93   0   1
99
 6  8  1      0 802392   3656 103220   0   0   222     0  251   757  62  37
1
 0 10  0      0 804588   3684 103252   0   0   133    40  214   871  50  43
8
 2  8  1      0 804324   3712 103272   0   0   162    40  222   817  43  40
18
 9  4  0      0 805400   3712 103288   0   0     4     0  196   276  13   8
79
 9  0  1      0 804724   3812 103348   0   0   644    96  297  1255  41  59
0
14  0  1      0 804144   3816 103344   0   0     0     0  167   888  56  44
0
10  0  1      0 804448   3820 103340   0   0     0     0  171   873  57  43
0
 0  1  0      0 812288   3828 103436   0   0    97     0  222  1051  53  42
5
 0  0  0      0 811868   3828 103476   0   0    84     0  395   429   0   3
97
 1  0  0      0 811868   3828 103476   0   0     0     0  167    91   0   1
99
 1  0  0      0 811868   3828 103476   0   0     0     0  167   141   1   0
99
 0  0  0      0 811868   3828 103476   0   0     0     0  177   100   1   0
99
 0  0  0      0 811868   3828 103476   0   0     0     0  171    96   0   1
99
 1  0  0      0 811868   3828 103476   0   0     0     0  171   132   1   0
99
 0  0  0      0 811868   3828 103476   0   0     0     0  170   100   1   0
99
 0  0  0      0 811868   3828 103476   0   0     0     0  167    90   1   0
99
 0  0  0      0 811868   3828 103476   0   0     0     0  171    93   1   0
99
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 0  0  0      0 811868   3828 103476   0   0     0     0  170   148   1   0
99
 0  0  0      0 811868   3828 103476   0   0     0     0  176    83   0   1
99
 0  0  0      0 811868   3828 103476   0   0     0     0  167    86   0   1
99
 0  0  0      0 811868   3828 103476   0   0     0     0  169   146   0   1
99
 0  0  0      0 811832   3828 103512   0   0     0   173  168    84   1   0
99
 0  0  0      0 811832   3828 103512   0   0     0     0  178   104   1   0
99
 0  0  0      0 811832   3828 103512   0   0     0     0  167   141   0   1
99
 0  0  0      0 811832   3828 103512   0   0     0     0  173    92   1   2
97
 0  0  0      0 811832   3828 103512   0   0     0     1  169    89   0   2
98
 1  0  0      0 811832   3828 103512   0   0     0     0  173   138   2   3
95
 1  0  0      0 811684   3828 103516   0   0     0     0  183   132   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  178   103   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0    19  180   108   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     0  169   148   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     0  169    88   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  171    94   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     1  170   147   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     0  168    94   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  171    95   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     0  173   175   3   0
97
 0  0  0      0 811668   3828 103516   0   0     0    15  169    89   1   0
99
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 0  0  0      0 811668   3828 103516   0   0     0     0  168    87   1   0
99
 1  0  0      0 811668   3828 103516   0   0     0     0  178   156   2   1
97
 0  0  0      0 811668   3828 103516   0   0     0     0  175   122   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0    15  167    86   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     0  171    92   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  177   151   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  169    93   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     1  169    86   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  167   146   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  172    95   0   1
99
 0  0  0      0 811668   3828 103516   0   0     0     0  167    84   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     1  178   189   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     0  171    89   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     0  171    89   0   1
99
 1  0  0      0 811668   3828 103516   0   0     0     0  171   124   1   0
99
 0  0  0      0 811668   3828 103516   0   0     0     1  183   127   0   1
99

bash# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/ram1               125011     85304     33154  73% /
coreserver:/var/cores
                      74858752    475144  70580928   1%
/var/.automount/cores
/dev/sdj1              7827172        24   7429540   1% /disks/disk10.1
/dev/sdi1              7827172        24   7429540   1% /disks/disk9.1
/dev/sdl1              7827172        24   7429540   1% /disks/disk12.1
/dev/sdk1              7827172        24   7429540   1% /disks/disk11.1
/dev/sdh1              7827172        24   7429540   1% /disks/disk8.1
/dev/sdf1              7827172        24   7429540   1% /disks/disk6.1
/dev/sde1              7827172        24   7429540   1% /disks/disk5.1
/dev/sdb1              7827172        24   7429540   1% /disks/disk2.1
/dev/sdg1              7827172        24   7429540   1% /disks/disk7.1
/dev/sda1              7827172        24   7429540   1% /disks/disk1.1
/dev/sdd1              7827172        24   7429540   1% /disks/disk4.1
/dev/sdc1              7827172        24   7429540   1% /disks/disk3.1
bash#


bash# vmstat 1
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 0  0  0      0 755440   3924 105192   0   0    83  8108  353   655   5  25
69
 0  0  1      0 755440   3924 105192   0   0     0     0  167    70   0   0
100
 0  0  0      0 755440   3924 105192   0   0     0    24  159    82   0   1
99
 0  0  0      0 755052   3924 105192   0   0     0     0  162   133   0   0
100
 0  0  0      0 755052   3924 105192   0   0     0     0  156    68   0   1
99
 0  0  0      0 755052   3924 105192   0   0     0     0  155    59   0   0
100
 0  0  0      0 755052   3924 105192   0   0     0     1  157   124   0   0
100
 0  0  0      0 755052   3924 105192   0   0     0     0  174    82   0   1
99
 0  0  0      0 755052   3924 105192   0   0     0     0  161    73   0   0
100
 1  0  0      0 755052   3924 105192   0   0     0     0  159   101   0   0
100
 0  0  0      0 755052   3924 105192   0   0     0     1  155    92   0   0
100
 0  0  0      0 755052   3924 105192   0   0     0     0  155    57   0   0
100
 0  0  0      0 755052   3924 105192   0   0     0     0  155    67   1   0
99
 0  0  0      0 755052   3924 105192   0   0     0     0  155   112   0   0
100
 0  0  0      0 754440   3924 105192   0   0     0     6  157    67   0   0
100
 0  0  0      0 754440   3924 105192   0   0     0     0  155    62   0   0
100
 0  0  0      0 754440   3924 105192   0   0     0     0  157   128   0   0
100
 0  0  0      0 754440   3924 105192   0   0     0     0  160    66   0   1
99
 0  0  0      0 754440   3924 105192   0   0     0     1  157    72   0   0
100
 0  0  0      0 754440   3924 105192   0   0     0     0  155   117   0   0
100
 0  0  0      0 754440   3924 105192   0   0     0     0  155    71   0   0
100
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 0  0  0      0 754440   3924 105192   0   0     0     0  157    66   0   0
100
 1  0  0      0 754440   3924 105192   0   0     0     1  166   114   0   1
99
 0  0  0      0 754440   3924 105192   0   0     0     0  158    93   0   0
100

there are no hangs. On startup, I am doing parallel mje2fs accross all the
drives. 3ware 4-port controller shows that LEDs are ON. I have tried
replacing the controllers but that also does not help ...

Thanks
Manish



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.17 SMP hangs ..
  2002-11-21  5:28 2.4.17 SMP hangs Manish Lachwani
@ 2002-11-21  6:03 ` Andrew Morton
  2002-11-26 15:22   ` Theodore Ts'o
  2002-11-21  6:33 ` Andre Hedrick
  2002-11-22 12:00 ` William Lee Irwin III
  2 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2002-11-21  6:03 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: linux-kernel

Manish Lachwani wrote:
> 
> I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
> drives in parallel. However, the hangs only occur when the I/O rate from
> vmstat is high:
> 

Quite possibly it has not hung.  You just need to wait half an
hour or so.

The algorithm isn't very good.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.17 SMP hangs ..
  2002-11-21  5:28 2.4.17 SMP hangs Manish Lachwani
  2002-11-21  6:03 ` Andrew Morton
@ 2002-11-21  6:33 ` Andre Hedrick
  2002-11-22 12:00 ` William Lee Irwin III
  2 siblings, 0 replies; 7+ messages in thread
From: Andre Hedrick @ 2002-11-21  6:33 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: linux-kernel


The problem may be associated w/ the interrupt mapping over the bridge
riser.  I have seen this before and can reproduce it regardless of the
system.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Wed, 20 Nov 2002, Manish Lachwani wrote:

> I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
> drives in parallel. However, the hangs only occur when the I/O rate from
> vmstat is high:
> 
> 
> bash# vmstat 1
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  1  0  0      0 815800   3656 102612   0   0   163  1086  129   961   6  29
> 65
>  0  0  0      0 813096   3656 102992   0   0   363     0  265   928  44  26
> 31
>  0  0  0      0 813172   3656 102996   0   0     4     0  180    99   0   1
> 99
>  5  2  1      0 809476   3656 103044   0   0    44     0  206   274  18  11
> 71
>  0 12  0      0 802040   3656 103072   0   0    18     2  296   745  47  33
> 21
>  0  8  0      0 800696   3660 103152   0   0    34   244  349   501  10   5
> 85
>  0  8  0      0 800696   3660 103152   0   0     0     0  187   106   1   0
> 99
>  2  5  1      0 795864   3660 103200   0   0    13    45  278   717  14  29
> 57
>  1  0  0      0 795592   3660 103248   0   0   107    46  502   663   7   7
> 86
>  1  0  0      0 795592   3660 103248   0   0     0     0  184    95   0   1
> 99
>  1  0  0      0 795596   3660 103244   0   0     4     1  191   139   0   1
> 99
>  6  5  3      0 756932   3660 140192   0   0   194 36721  464   232   7  87
> 6
> 11  0  1      0 723276   3660 173784   0   0     0 33718  681  1560   0  39
> 61
> 
> At that point the system hangs. The system consists of a 4-port and a 8-port
> 3ware controllers on an Intel 21154 bridge with 12 maxtor drives. When the
> IO rate is lower:
> 
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  0  0  0      0 811888   3828 103476   0   0     0     0  172    91   0   1
> 99
>  0  0  0      0 811888   3828 103476   0   0     0     0  184    93   0   1
> 99
>  0  0  0      0 811888   3828 103476   0   0     0     0  171   153   0   1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0   170  171    85   0   1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     0  174    95   1   0
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     0  173   147   0   1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     0  176    98   0   1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     1  175    96   0   1
> 99
>  1  0  0      0 811852   3828 103512   0   0     0     0  173   110   1   3
> 96
>  1  0  0      0 811704   3828 103516   0   0     0     0  179   145   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  194   121   1   1
> 98
>  0  0  0      0 811688   3828 103516   0   0     0    19  186   119   1   1
> 98
>  0  0  0      0 811688   3828 103516   0   0     0     0  174   149   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  172    86   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  175    96   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  171   149   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    91   0   1
> 99
>  0  0  1      0 811688   3828 103516   0   0     0     0  173    86   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  179   154   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  174    91   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  179    98   1   0
> 99
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  1  0  0      0 811688   3828 103516   0   0     0     0  174   100   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  184   137   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  171    89   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    95   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  176   150   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  175   121   3   0
> 97
>  0  0  0      0 811688   3828 103516   0   0     0    15  171   116   3   0
> 97
>  0  0  0      0 811688   3828 103516   0   0     0     0  179   149   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    88   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    88   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0    15  172   149   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    88   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  174    98   0   1
> 99
>  1  0  0      0 811688   3828 103516   0   0     0     0  178   127   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  179   153   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    88   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    88   0   1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  175   158   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  175    96   1   0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    90   0   1
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  173   204   1   5
> 94
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  1  0  0      0 811924   3828 103516   0   0     0     0  174    90   1   0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0    24  182    93   1   0
> 99
>  1  0  0      0 811924   3828 103516   0   0     0     0  173   142   0   1
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  171    93   1   0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  175    94   0   1
> 99
>  1  0  0      0 811924   3828 103516   0   0     0     2  173    92   1   0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  175   151   1   0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  173    87   1   0
> 99
>  0  0  1      0 811924   3828 103516   0   0     0     0  173    89   1   0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     1  171   142   0   1
> 99
> 
> bash# vmstat 1
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  5  0  1      0 815896   3656 102228   0   0   156  1076  129   917   6  34
> 60
>  2  0  1      0 812860   3656 102992   0   0   737     0  252  1015  36  40
> 25
>  0  0  0      0 813104   3656 102992   0   0     0     0  187   129   2   1
> 97
>  0  0  0      0 813180   3656 102996   0   0     4     0  168    93   0   1
> 99
>  6  8  1      0 802392   3656 103220   0   0   222     0  251   757  62  37
> 1
>  0 10  0      0 804588   3684 103252   0   0   133    40  214   871  50  43
> 8
>  2  8  1      0 804324   3712 103272   0   0   162    40  222   817  43  40
> 18
>  9  4  0      0 805400   3712 103288   0   0     4     0  196   276  13   8
> 79
>  9  0  1      0 804724   3812 103348   0   0   644    96  297  1255  41  59
> 0
> 14  0  1      0 804144   3816 103344   0   0     0     0  167   888  56  44
> 0
> 10  0  1      0 804448   3820 103340   0   0     0     0  171   873  57  43
> 0
>  0  1  0      0 812288   3828 103436   0   0    97     0  222  1051  53  42
> 5
>  0  0  0      0 811868   3828 103476   0   0    84     0  395   429   0   3
> 97
>  1  0  0      0 811868   3828 103476   0   0     0     0  167    91   0   1
> 99
>  1  0  0      0 811868   3828 103476   0   0     0     0  167   141   1   0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  177   100   1   0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  171    96   0   1
> 99
>  1  0  0      0 811868   3828 103476   0   0     0     0  171   132   1   0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  170   100   1   0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  167    90   1   0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  171    93   1   0
> 99
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  0  0  0      0 811868   3828 103476   0   0     0     0  170   148   1   0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  176    83   0   1
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  167    86   0   1
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  169   146   0   1
> 99
>  0  0  0      0 811832   3828 103512   0   0     0   173  168    84   1   0
> 99
>  0  0  0      0 811832   3828 103512   0   0     0     0  178   104   1   0
> 99
>  0  0  0      0 811832   3828 103512   0   0     0     0  167   141   0   1
> 99
>  0  0  0      0 811832   3828 103512   0   0     0     0  173    92   1   2
> 97
>  0  0  0      0 811832   3828 103512   0   0     0     1  169    89   0   2
> 98
>  1  0  0      0 811832   3828 103512   0   0     0     0  173   138   2   3
> 95
>  1  0  0      0 811684   3828 103516   0   0     0     0  183   132   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  178   103   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0    19  180   108   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  169   148   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  169    88   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    94   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  170   147   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  168    94   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    95   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  173   175   3   0
> 97
>  0  0  0      0 811668   3828 103516   0   0     0    15  169    89   1   0
> 99
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  0  0  0      0 811668   3828 103516   0   0     0     0  168    87   1   0
> 99
>  1  0  0      0 811668   3828 103516   0   0     0     0  178   156   2   1
> 97
>  0  0  0      0 811668   3828 103516   0   0     0     0  175   122   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0    15  167    86   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    92   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  177   151   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  169    93   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  169    86   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  167   146   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  172    95   0   1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  167    84   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  178   189   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    89   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    89   0   1
> 99
>  1  0  0      0 811668   3828 103516   0   0     0     0  171   124   1   0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  183   127   0   1
> 99
> 
> bash# df
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/ram1               125011     85304     33154  73% /
> coreserver:/var/cores
>                       74858752    475144  70580928   1%
> /var/.automount/cores
> /dev/sdj1              7827172        24   7429540   1% /disks/disk10.1
> /dev/sdi1              7827172        24   7429540   1% /disks/disk9.1
> /dev/sdl1              7827172        24   7429540   1% /disks/disk12.1
> /dev/sdk1              7827172        24   7429540   1% /disks/disk11.1
> /dev/sdh1              7827172        24   7429540   1% /disks/disk8.1
> /dev/sdf1              7827172        24   7429540   1% /disks/disk6.1
> /dev/sde1              7827172        24   7429540   1% /disks/disk5.1
> /dev/sdb1              7827172        24   7429540   1% /disks/disk2.1
> /dev/sdg1              7827172        24   7429540   1% /disks/disk7.1
> /dev/sda1              7827172        24   7429540   1% /disks/disk1.1
> /dev/sdd1              7827172        24   7429540   1% /disks/disk4.1
> /dev/sdc1              7827172        24   7429540   1% /disks/disk3.1
> bash#
> 
> 
> bash# vmstat 1
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  0  0  0      0 755440   3924 105192   0   0    83  8108  353   655   5  25
> 69
>  0  0  1      0 755440   3924 105192   0   0     0     0  167    70   0   0
> 100
>  0  0  0      0 755440   3924 105192   0   0     0    24  159    82   0   1
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  162   133   0   0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  156    68   0   1
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  155    59   0   0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     1  157   124   0   0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  174    82   0   1
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  161    73   0   0
> 100
>  1  0  0      0 755052   3924 105192   0   0     0     0  159   101   0   0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     1  155    92   0   0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  155    57   0   0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  155    67   1   0
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  155   112   0   0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     6  157    67   0   0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  155    62   0   0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  157   128   0   0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  160    66   0   1
> 99
>  0  0  0      0 754440   3924 105192   0   0     0     1  157    72   0   0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  155   117   0   0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  155    71   0   0
> 100
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
> id
>  0  0  0      0 754440   3924 105192   0   0     0     0  157    66   0   0
> 100
>  1  0  0      0 754440   3924 105192   0   0     0     1  166   114   0   1
> 99
>  0  0  0      0 754440   3924 105192   0   0     0     0  158    93   0   0
> 100
> 
> there are no hangs. On startup, I am doing parallel mje2fs accross all the
> drives. 3ware 4-port controller shows that LEDs are ON. I have tried
> replacing the controllers but that also does not help ...
> 
> Thanks
> Manish
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.17 SMP hangs ..
  2002-11-21  5:28 2.4.17 SMP hangs Manish Lachwani
  2002-11-21  6:03 ` Andrew Morton
  2002-11-21  6:33 ` Andre Hedrick
@ 2002-11-22 12:00 ` William Lee Irwin III
  2 siblings, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2002-11-22 12:00 UTC (permalink / raw)
  To: Manish Lachwani; +Cc: linux-kernel

On Wed, Nov 20, 2002 at 09:28:06PM -0800, Manish Lachwani wrote:
> I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
> drives in parallel. However, the hangs only occur when the I/O rate from
> vmstat is high:

This is the BKL/irqlock/io_request_lock contention issue. 2.5.x is
the only answer here: it has per-queue locking and has removed the
irqlock from the BKL.


Bill

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: 2.4.17 SMP hangs ..
@ 2002-11-22 22:30 Manish Lachwani
  0 siblings, 0 replies; 7+ messages in thread
From: Manish Lachwani @ 2002-11-22 22:30 UTC (permalink / raw)
  To: 'Andre Hedrick', Manish Lachwani; +Cc: linux-kernel

How can I conveniently produce this hang? Can you give me an example?

Thanks
-Manish

-----Original Message-----
From: Andre Hedrick [mailto:andre@linux-ide.org]
Sent: Wednesday, November 20, 2002 10:33 PM
To: Manish Lachwani
Cc: linux-kernel@vger.kernel.org
Subject: Re: 2.4.17 SMP hangs ..



The problem may be associated w/ the interrupt mapping over the bridge
riser.  I have seen this before and can reproduce it regardless of the
system.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Wed, 20 Nov 2002, Manish Lachwani wrote:

> I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros
12
> drives in parallel. However, the hangs only occur when the I/O rate from
> vmstat is high:
> 
> 
> bash# vmstat 1
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  1  0  0      0 815800   3656 102612   0   0   163  1086  129   961   6
29
> 65
>  0  0  0      0 813096   3656 102992   0   0   363     0  265   928  44
26
> 31
>  0  0  0      0 813172   3656 102996   0   0     4     0  180    99   0
1
> 99
>  5  2  1      0 809476   3656 103044   0   0    44     0  206   274  18
11
> 71
>  0 12  0      0 802040   3656 103072   0   0    18     2  296   745  47
33
> 21
>  0  8  0      0 800696   3660 103152   0   0    34   244  349   501  10
5
> 85
>  0  8  0      0 800696   3660 103152   0   0     0     0  187   106   1
0
> 99
>  2  5  1      0 795864   3660 103200   0   0    13    45  278   717  14
29
> 57
>  1  0  0      0 795592   3660 103248   0   0   107    46  502   663   7
7
> 86
>  1  0  0      0 795592   3660 103248   0   0     0     0  184    95   0
1
> 99
>  1  0  0      0 795596   3660 103244   0   0     4     1  191   139   0
1
> 99
>  6  5  3      0 756932   3660 140192   0   0   194 36721  464   232   7
87
> 6
> 11  0  1      0 723276   3660 173784   0   0     0 33718  681  1560   0
39
> 61
> 
> At that point the system hangs. The system consists of a 4-port and a
8-port
> 3ware controllers on an Intel 21154 bridge with 12 maxtor drives. When the
> IO rate is lower:
> 
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  0  0  0      0 811888   3828 103476   0   0     0     0  172    91   0
1
> 99
>  0  0  0      0 811888   3828 103476   0   0     0     0  184    93   0
1
> 99
>  0  0  0      0 811888   3828 103476   0   0     0     0  171   153   0
1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0   170  171    85   0
1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     0  174    95   1
0
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     0  173   147   0
1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     0  176    98   0
1
> 99
>  0  0  0      0 811852   3828 103512   0   0     0     1  175    96   0
1
> 99
>  1  0  0      0 811852   3828 103512   0   0     0     0  173   110   1
3
> 96
>  1  0  0      0 811704   3828 103516   0   0     0     0  179   145   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  194   121   1
1
> 98
>  0  0  0      0 811688   3828 103516   0   0     0    19  186   119   1
1
> 98
>  0  0  0      0 811688   3828 103516   0   0     0     0  174   149   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  172    86   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  175    96   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  171   149   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    91   0
1
> 99
>  0  0  1      0 811688   3828 103516   0   0     0     0  173    86   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  179   154   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  174    91   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  179    98   1
0
> 99
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  1  0  0      0 811688   3828 103516   0   0     0     0  174   100   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  184   137   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  171    89   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    95   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  176   150   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  175   121   3
0
> 97
>  0  0  0      0 811688   3828 103516   0   0     0    15  171   116   3
0
> 97
>  0  0  0      0 811688   3828 103516   0   0     0     0  179   149   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    88   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    88   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0    15  172   149   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    88   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  174    98   0
1
> 99
>  1  0  0      0 811688   3828 103516   0   0     0     0  178   127   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  179   153   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    88   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  173    88   0
1
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  175   158   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     1  175    96   1
0
> 99
>  0  0  0      0 811688   3828 103516   0   0     0     0  171    90   0
1
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  173   204   1
5
> 94
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  1  0  0      0 811924   3828 103516   0   0     0     0  174    90   1
0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0    24  182    93   1
0
> 99
>  1  0  0      0 811924   3828 103516   0   0     0     0  173   142   0
1
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  171    93   1
0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  175    94   0
1
> 99
>  1  0  0      0 811924   3828 103516   0   0     0     2  173    92   1
0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  175   151   1
0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     0  173    87   1
0
> 99
>  0  0  1      0 811924   3828 103516   0   0     0     0  173    89   1
0
> 99
>  0  0  0      0 811924   3828 103516   0   0     0     1  171   142   0
1
> 99
> 
> bash# vmstat 1
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  5  0  1      0 815896   3656 102228   0   0   156  1076  129   917   6
34
> 60
>  2  0  1      0 812860   3656 102992   0   0   737     0  252  1015  36
40
> 25
>  0  0  0      0 813104   3656 102992   0   0     0     0  187   129   2
1
> 97
>  0  0  0      0 813180   3656 102996   0   0     4     0  168    93   0
1
> 99
>  6  8  1      0 802392   3656 103220   0   0   222     0  251   757  62
37
> 1
>  0 10  0      0 804588   3684 103252   0   0   133    40  214   871  50
43
> 8
>  2  8  1      0 804324   3712 103272   0   0   162    40  222   817  43
40
> 18
>  9  4  0      0 805400   3712 103288   0   0     4     0  196   276  13
8
> 79
>  9  0  1      0 804724   3812 103348   0   0   644    96  297  1255  41
59
> 0
> 14  0  1      0 804144   3816 103344   0   0     0     0  167   888  56
44
> 0
> 10  0  1      0 804448   3820 103340   0   0     0     0  171   873  57
43
> 0
>  0  1  0      0 812288   3828 103436   0   0    97     0  222  1051  53
42
> 5
>  0  0  0      0 811868   3828 103476   0   0    84     0  395   429   0
3
> 97
>  1  0  0      0 811868   3828 103476   0   0     0     0  167    91   0
1
> 99
>  1  0  0      0 811868   3828 103476   0   0     0     0  167   141   1
0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  177   100   1
0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  171    96   0
1
> 99
>  1  0  0      0 811868   3828 103476   0   0     0     0  171   132   1
0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  170   100   1
0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  167    90   1
0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  171    93   1
0
> 99
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  0  0  0      0 811868   3828 103476   0   0     0     0  170   148   1
0
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  176    83   0
1
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  167    86   0
1
> 99
>  0  0  0      0 811868   3828 103476   0   0     0     0  169   146   0
1
> 99
>  0  0  0      0 811832   3828 103512   0   0     0   173  168    84   1
0
> 99
>  0  0  0      0 811832   3828 103512   0   0     0     0  178   104   1
0
> 99
>  0  0  0      0 811832   3828 103512   0   0     0     0  167   141   0
1
> 99
>  0  0  0      0 811832   3828 103512   0   0     0     0  173    92   1
2
> 97
>  0  0  0      0 811832   3828 103512   0   0     0     1  169    89   0
2
> 98
>  1  0  0      0 811832   3828 103512   0   0     0     0  173   138   2
3
> 95
>  1  0  0      0 811684   3828 103516   0   0     0     0  183   132   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  178   103   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0    19  180   108   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  169   148   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  169    88   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    94   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  170   147   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  168    94   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    95   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  173   175   3
0
> 97
>  0  0  0      0 811668   3828 103516   0   0     0    15  169    89   1
0
> 99
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  0  0  0      0 811668   3828 103516   0   0     0     0  168    87   1
0
> 99
>  1  0  0      0 811668   3828 103516   0   0     0     0  178   156   2
1
> 97
>  0  0  0      0 811668   3828 103516   0   0     0     0  175   122   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0    15  167    86   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    92   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  177   151   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  169    93   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  169    86   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  167   146   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  172    95   0
1
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  167    84   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  178   189   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    89   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     0  171    89   0
1
> 99
>  1  0  0      0 811668   3828 103516   0   0     0     0  171   124   1
0
> 99
>  0  0  0      0 811668   3828 103516   0   0     0     1  183   127   0
1
> 99
> 
> bash# df
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/ram1               125011     85304     33154  73% /
> coreserver:/var/cores
>                       74858752    475144  70580928   1%
> /var/.automount/cores
> /dev/sdj1              7827172        24   7429540   1% /disks/disk10.1
> /dev/sdi1              7827172        24   7429540   1% /disks/disk9.1
> /dev/sdl1              7827172        24   7429540   1% /disks/disk12.1
> /dev/sdk1              7827172        24   7429540   1% /disks/disk11.1
> /dev/sdh1              7827172        24   7429540   1% /disks/disk8.1
> /dev/sdf1              7827172        24   7429540   1% /disks/disk6.1
> /dev/sde1              7827172        24   7429540   1% /disks/disk5.1
> /dev/sdb1              7827172        24   7429540   1% /disks/disk2.1
> /dev/sdg1              7827172        24   7429540   1% /disks/disk7.1
> /dev/sda1              7827172        24   7429540   1% /disks/disk1.1
> /dev/sdd1              7827172        24   7429540   1% /disks/disk4.1
> /dev/sdc1              7827172        24   7429540   1% /disks/disk3.1
> bash#
> 
> 
> bash# vmstat 1
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  0  0  0      0 755440   3924 105192   0   0    83  8108  353   655   5
25
> 69
>  0  0  1      0 755440   3924 105192   0   0     0     0  167    70   0
0
> 100
>  0  0  0      0 755440   3924 105192   0   0     0    24  159    82   0
1
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  162   133   0
0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  156    68   0
1
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  155    59   0
0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     1  157   124   0
0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  174    82   0
1
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  161    73   0
0
> 100
>  1  0  0      0 755052   3924 105192   0   0     0     0  159   101   0
0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     1  155    92   0
0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  155    57   0
0
> 100
>  0  0  0      0 755052   3924 105192   0   0     0     0  155    67   1
0
> 99
>  0  0  0      0 755052   3924 105192   0   0     0     0  155   112   0
0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     6  157    67   0
0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  155    62   0
0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  157   128   0
0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  160    66   0
1
> 99
>  0  0  0      0 754440   3924 105192   0   0     0     1  157    72   0
0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  155   117   0
0
> 100
>  0  0  0      0 754440   3924 105192   0   0     0     0  155    71   0
0
> 100
>    procs                      memory    swap          io     system
> cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy
> id
>  0  0  0      0 754440   3924 105192   0   0     0     0  157    66   0
0
> 100
>  1  0  0      0 754440   3924 105192   0   0     0     1  166   114   0
1
> 99
>  0  0  0      0 754440   3924 105192   0   0     0     0  158    93   0
0
> 100
> 
> there are no hangs. On startup, I am doing parallel mje2fs accross all the
> drives. 3ware 4-port controller shows that LEDs are ON. I have tried
> replacing the controllers but that also does not help ...
> 
> Thanks
> Manish
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.17 SMP hangs ..
  2002-11-21  6:03 ` Andrew Morton
@ 2002-11-26 15:22   ` Theodore Ts'o
  2002-11-26 16:13     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2002-11-26 15:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manish Lachwani, linux-kernel

On Wed, Nov 20, 2002 at 10:03:50PM -0800, Andrew Morton wrote:
> Manish Lachwani wrote:
> > 
> > I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
> > drives in parallel. However, the hangs only occur when the I/O rate from
> > vmstat is high:
> > 
> 
> Quite possibly it has not hung.  You just need to wait half an
> hour or so.
> 
> The algorithm isn't very good.

[Catching up lkml mail after the IETF meeting....]

Try setting the environment variable "MKE2FS_SYNC" to a value such as
10.  This will cause mke2fs to force a sync after writing out every 10
block groups worth of inode tables.  

If this fixes the problem, then it means that the kernel isn't
handling write throttling correctly, and the system is thrashing
itself to death.  Write thottleing is one of these kernel bugs which
gets fixed and broken in the kernel multiple times.  I've considered
making MKE2FS_SYNC the default, but I haven't, mainly because current
behaviour is a great way of pointing out this write throttling bugs in
the VM.  (Stephen has fixed this bug multiple times over the years,
and he suggested that having a good test case for noticing when
someone has broken write throttling would be a Good Thing --- and it
seems to get broken fairly often, as people try to make improvements
to the VM layer.....)

						- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.17 SMP hangs ..
  2002-11-26 15:22   ` Theodore Ts'o
@ 2002-11-26 16:13     ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2002-11-26 16:13 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Manish Lachwani, linux-kernel

Theodore Ts'o wrote:
> 
> On Wed, Nov 20, 2002 at 10:03:50PM -0800, Andrew Morton wrote:
> > Manish Lachwani wrote:
> > >
> > > I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
> > > drives in parallel. However, the hangs only occur when the I/O rate from
> > > vmstat is high:
> > >
> >
> > Quite possibly it has not hung.  You just need to wait half an
> > hour or so.
> >
> > The algorithm isn't very good.
> 
> [Catching up lkml mail after the IETF meeting....]
> 
> Try setting the environment variable "MKE2FS_SYNC" to a value such as
> 10.  This will cause mke2fs to force a sync after writing out every 10
> block groups worth of inode tables.

That will fix it.

> If this fixes the problem, then it means that the kernel isn't
> handling write throttling correctly, and the system is thrashing
> itself to death.

Nah, it's __block_fsync().  That function has to write buffers against
a particular device.  So it searches the global buffer LRU for 32 buffers
against the nominated device, drops the lock, writes the buffers, then
searches again.

So the search complexity is O(n*n/32).  Which means that when you have a
lot of dirty buffers against different devices on the queue the CPU cost
simply explodes.

One workaround is to use sync rather than fsync - because sync uses NODEV
and doesn't have to search past buffers from uninteresting devices. 
Another is to do what you've done.

Probably, just syncing the buffers at i_dirty_data_buffers would suffice.
That would fix it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-11-26 16:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-21  5:28 2.4.17 SMP hangs Manish Lachwani
2002-11-21  6:03 ` Andrew Morton
2002-11-26 15:22   ` Theodore Ts'o
2002-11-26 16:13     ` Andrew Morton
2002-11-21  6:33 ` Andre Hedrick
2002-11-22 12:00 ` William Lee Irwin III
  -- strict thread matches above, loose matches on Subject: below --
2002-11-22 22:30 Manish Lachwani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.