2.6.12.2 dies after 24 hours

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.12.2 dies after 24 hours
@ 2005-07-12  9:26 Rob Mueller
  2005-07-12  9:43 ` Lars Roland
  0 siblings, 1 reply; 12+ messages in thread
From: Rob Mueller @ 2005-07-12  9:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Bron Gondwana, Jeremy Howard

As background, we've been using a relatively old kernel (2.6.4-mm2) on some 
IBM x235 machines with 6G of RAM, umem cards, and serveraid storage. These 
machines are under continuous heavy-ish load, load avg between about 1 and 
5, with between 2500-3500 procs at all times, with several largish ReiserFS 
partitions and have been running *really* well with >250 days uptime on one 
machine.

We recently tried upgrading one of the machines to the latest kernel 
(2.6.12.2) and it's died after about 24 hours. It seemed to end up in some 
weird state where we could ssh into it, and some commands worked (eg uptime) 
but process list related commands (ps) would just freeze up into an 
unkillable state and we'd have to close the seesion and ssh in again.

I did manage to get a full sysrq-t dump. I've placed it, a kernel config 
dump, and a dmesg boot dump here:

http://robm.fastmail.fm/kernel/t7/

Hope this provides some useful data to track down the problem.

Rob

PS. Yes, I know this is a non-PAE kernel on a 6G machine so 2G was unused. 
This was a mistake in this case, but it still doesn't explain the crash...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-12  9:26 2.6.12.2 dies after 24 hours Rob Mueller
@ 2005-07-12  9:43 ` Lars Roland
  2005-07-12 11:46   ` Rob Mueller
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Roland @ 2005-07-12  9:43 UTC (permalink / raw)
  To: Rob Mueller; +Cc: linux-kernel, Bron Gondwana, Jeremy Howard

On 7/12/05, Rob Mueller <robm@fastmail.fm> wrote:
> As background, we've been using a relatively old kernel (2.6.4-mm2) on some
> IBM x235 machines with 6G of RAM, umem cards, and serveraid storage. These
> machines are under continuous heavy-ish load, load avg between about 1 and
> 5, with between 2500-3500 procs at all times, with several largish ReiserFS
> partitions and have been running *really* well with >250 days uptime on one
> machine.
> 
> We recently tried upgrading one of the machines to the latest kernel
> (2.6.12.2) and it's died after about 24 hours. It seemed to end up in some
> weird state where we could ssh into it, and some commands worked (eg uptime)
> but process list related commands (ps) would just freeze up into an
> unkillable state and we'd have to close the seesion and ssh in again.

I experienced the exact same thing on a IBM 335 - in my case I had
messed up with the ACPI setup. Could you paste the output from
/proc/interupts also is your kernel running with IRQ balancing ?.


Regards.

Lars Roland

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-12  9:43 ` Lars Roland
@ 2005-07-12 11:46   ` Rob Mueller
  2005-07-12 12:13     ` Lars Roland
  0 siblings, 1 reply; 12+ messages in thread
From: Rob Mueller @ 2005-07-12 11:46 UTC (permalink / raw)
  To: Lars Roland; +Cc: linux-kernel, Bron Gondwana, Jeremy Howard


> > We recently tried upgrading one of the machines to the latest kernel
> > (2.6.12.2) and it's died after about 24 hours. It seemed to end up in 
> > some
> > weird state where we could ssh into it, and some commands worked (eg 
> > uptime)
> > but process list related commands (ps) would just freeze up into an
> > unkillable state and we'd have to close the seesion and ssh in again.
>
> I experienced the exact same thing on a IBM 335 - in my case I had
> messed up with the ACPI setup. Could you paste the output from
> /proc/interupts also is your kernel running with IRQ balancing ?.

Here's the /proc/interrupts dump:

           CPU0       CPU1       CPU2       CPU3
  0:   11524000          0          0          0    IO-APIC-edge  timer
  1:          8          0          0          0    IO-APIC-edge  i8042
  5:          0          0          0          0   IO-APIC-level  acpi
 14:         13          0          0          0    IO-APIC-edge  ide0
 16:          2          0          0          0   IO-APIC-level  ibmasm0
 20:    2978604          0    2338027          0   IO-APIC-level  eth0
 22:    1321957          0          0          0   IO-APIC-level  ips
 24:     581291          0          0          0   IO-APIC-level  pci-umem
 29:     257154          0          0          0   IO-APIC-level  eth1
NMI:          0          0          0          0
LOC:   11524185   11524201   11524194   11524121
ERR:          0
MIS:          0

I'm not sure about IRQ balancing sorry. How do I tell? The entire boot 
process output is here:

http://robm.fastmail.fm/kernel/t7/bootdmesg.txt

And the config is here:

http://robm.fastmail.fm/kernel/t7/config.txt

Does that help?

Our boot doesn't pass any special parameters, just choosing the deadline 
elevator...

image=/boot/bzImage-2.6.12.2
  label=linux-2.6.12.2
  append="elevator=deadline"
  read-only
  root=/dev/sda2

Thanks for your help!

Rob


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-12 11:46   ` Rob Mueller
@ 2005-07-12 12:13     ` Lars Roland
  2005-07-12 13:51       ` Bron Gondwana
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Roland @ 2005-07-12 12:13 UTC (permalink / raw)
  To: Rob Mueller; +Cc: linux-kernel, Bron Gondwana, Jeremy Howard

On 7/12/05, Rob Mueller <robm@fastmail.fm> wrote:
> Here's the /proc/interrupts dump:
> 
>            CPU0       CPU1       CPU2       CPU3
>   0:   11524000          0          0          0    IO-APIC-edge  timer
>   1:          8          0          0          0    IO-APIC-edge  i8042
>   5:          0          0          0          0   IO-APIC-level  acpi
>  14:         13          0          0          0    IO-APIC-edge  ide0
>  16:          2          0          0          0   IO-APIC-level  ibmasm0
>  20:    2978604          0    2338027          0   IO-APIC-level  eth0
>  22:    1321957          0          0          0   IO-APIC-level  ips
>  24:     581291          0          0          0   IO-APIC-level  pci-umem
>  29:     257154          0          0          0   IO-APIC-level  eth1
> NMI:          0          0          0          0
> LOC:   11524185   11524201   11524194   11524121
> ERR:          0
> MIS:          0

Looks fine to me

> 
> I'm not sure about IRQ balancing sorry. How do I tell? The entire boot
> process output is here:
> 
> http://robm.fastmail.fm/kernel/t7/bootdmesg.txt
> 
> And the config is here:
> 
> http://robm.fastmail.fm/kernel/t7/config.txt

You have irq balancing, the line 

CONFIG_IRQBALANCE=y

in your config file confirms it - I am not completely sure that it is
the root of the problem but when I experienced the problem I changed
two things: my acpi code and irq balancing and one of then made the
difference, I am just to lazy to check which one it is (also it is
production servers so I cannot do whatever I want).


> Our boot doesn't pass any special parameters, just choosing the deadline
> elevator...
> 
> image=/boot/bzImage-2.6.12.2
>   label=linux-2.6.12.2
>   append="elevator=deadline"
>   read-only
>   root=/dev/sda2

I use the same io scheduler so that should not be a problem. I have
uploaded my config file - it works on ibm 335/336 servers, and a quick
look at your boot msg seams to indicate that your server have some of
the same hardware - note however that I load ide/scsi/filesystem stuff
as modules so you will need to build a initrd to use my config.

the config is here

http://randompage.org/static/kernel.conf



--
Lars

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-12 12:13     ` Lars Roland
@ 2005-07-12 13:51       ` Bron Gondwana
  2005-07-12 16:37         ` Lars Roland
  0 siblings, 1 reply; 12+ messages in thread
From: Bron Gondwana @ 2005-07-12 13:51 UTC (permalink / raw)
  To: Lars Roland, Rob Mueller; +Cc: linux-kernel, Jeremy Howard

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]


On Tue, 12 Jul 2005 14:13:01 +0200, "Lars Roland" <lroland@gmail.com>
said:
> You have irq balancing, the line 
> 
> CONFIG_IRQBALANCE=y
> 
> in your config file confirms it - I am not completely sure that it is
> the root of the problem but when I experienced the problem I changed
> two things: my acpi code and irq balancing and one of then made the
> difference, I am just to lazy to check which one it is (also it is
> production servers so I cannot do whatever I want).

Our ACPI looked very similar to yours, so I've disabled IRQBALANCE.

We'll be rebooting the server during a less busy time to try the new
kernel, so not for about 12 hours or so.

> >   append="elevator=deadline"
>
> I use the same io scheduler so that should not be a problem. I have
> uploaded my config file - it works on ibm 335/336 servers, and a quick
> look at your boot msg seams to indicate that your server have some of
> the same hardware - note however that I load ide/scsi/filesystem stuff
> as modules so you will need to build a initrd to use my config.
> 
> the config is here
> 
> http://randompage.org/static/kernel.conf

Great, thanks for that.  I've had a look through and I think the
IRQBALANCE issue is the most likely cause.

We're also applying the attached patch.  There's a bug in reiserfs that
gets tickled by our huge MMAP usage (it's amazing what really busy
Cyrus daemons can do to a server, ouch).  It's fixed in generic_write,
so we take the few percent performance hit for something that doesn't
break!

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm


[-- Attachment #2: patch-2.6.12.2-reiserfix.bz2 --]
[-- Type: application/octet-stream, Size: 284 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-12 13:51       ` Bron Gondwana
@ 2005-07-12 16:37         ` Lars Roland
  2005-07-13  0:27           ` Rob Mueller
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Roland @ 2005-07-12 16:37 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: Rob Mueller, linux-kernel, Jeremy Howard

On 7/12/05, Bron Gondwana <brong@fastmail.fm> wrote:
> We're also applying the attached patch.  There's a bug in reiserfs that
> gets tickled by our huge MMAP usage (it's amazing what really busy
> Cyrus daemons can do to a server, ouch).  It's fixed in generic_write,
> so we take the few percent performance hit for something that doesn't
> break!

Interesting - When I got the problem it was on mail servers under high
load (handling 60.000 emails pr. hour) with reiserfs as file system. I
have seen this problem on 5 different servers so I am confident that
it is not hardware failure.

Sometimes the server load just rises and then the server dies other
times the load rises but the kernel manages to get it back alive
filling up syslog with messages like this


---------
June 29 14:06:59 dkcphmx12 kernel: oom-killer: gfp_mask=0xd2
June 29 14:07:15 dkcphmx12 kernel: DMA per-cpu:
June 29 14:07:24 dkcphmx12 kernel: cpu 0 hot: low 2, high 6, batch 1
June 29 14:07:24 dkcphmx12 kernel: cpu 0 cold: low 0, high 2, batch 1
June 29 14:07:26 dkcphmx12 logger[17427]: *** SYSTEM UPDATE STARTED ***
June 29 14:07:26 dkcphmx12 kernel: cpu 1 hot: low 2, high 6, batch 1
June 29 14:07:26 dkcphmx12 kernel: cpu 1 cold: low 0, high 2, batch 1
June 29 14:07:26 dkcphmx12 kernel: cpu 2 hot: low 2, high 6, batch 1
June 29 14:07:26 dkcphmx12 kernel: cpu 2 cold: low 0, high 2, batch 1
June 29 14:07:26 dkcphmx12 kernel: cpu 3 hot: low 2, high 6, batch 1
June 29 14:07:26 dkcphmx12 kernel: cpu 3 cold: low 0, high 2, batch 1
June 29 14:07:26 dkcphmx12 kernel: Normal per-cpu:
June 29 14:07:26 dkcphmx12 kernel: cpu 0 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 0 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 1 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 1 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 2 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 2 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 3 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 3 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel: HighMem per-cpu:
June 29 14:07:26 dkcphmx12 kernel: cpu 0 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 0 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 1 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 1 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 2 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 2 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 3 hot: low 62, high 186, batch 31
June 29 14:07:26 dkcphmx12 kernel: cpu 3 cold: low 0, high 62, batch 31
June 29 14:07:26 dkcphmx12 kernel:
June 29 14:07:26 dkcphmx12 kernel: Free pages:       49196kB (496kB HighMem)
June 29 14:07:26 dkcphmx12 kernel: Active:246580 inactive:244789
dirty:0 writeback:0 unstable:0 free:12299 slab:4271 mapped:494975
pagetables:2332

June 29 14:07:26 dkcphmx12 kernel: DMA free:8192kB min:68kB low:84kB
high:100kB active:2644kB inactive:2108kB present:16384kB
pages_scanned:35654 all_unreclaimable? yes

June 29 14:07:26 dkcphmx12 kernel: lowmem_reserve[]: 0 880 2031
June 29 14:07:26 dkcphmx12 kernel: Normal free:40508kB min:3756kB
low:4692kB high:5632kB active:400096kB inactive:396232kB
present:901120kB pages_scanned:1933718 all_unreclaimable? yes

June 29 14:07:26 dkcphmx12 kernel: lowmem_reserve[]: 0 0 9214
June 29 14:07:26 dkcphmx12 kernel: HighMem free:496kB min:512kB
low:640kB high:768kB active:583452kB inactive:580816kB
present:1179504kB pages_scanned:8651915 all_unreclaimable? yes

June 29 14:07:26 dkcphmx12 kernel: lowmem_reserve[]: 0 0 0
June 29 14:07:26 dkcphmx12 kernel: DMA: 0*4kB 2*8kB 1*16kB 1*32kB
1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8192kB

June 29 14:07:26 dkcphmx12 kernel: Normal: 251*4kB 38*8kB 6*16kB
0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 9*4096kB =
40508kB

June 29 14:07:26 dkcphmx12 kernel: HighMem: 0*4kB 0*8kB 1*16kB 1*32kB
1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB

June 29 14:07:26 dkcphmx12 kernel: Swap cache: add 958067, delete
958073, find 182890/223964, race 0+1251
June 29 14:07:26 dkcphmx12 kernel: Free swap  = 0kB
June 29 14:07:26 dkcphmx12 kernel: Total swap = 2097136kB

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-12 16:37         ` Lars Roland
@ 2005-07-13  0:27           ` Rob Mueller
  2005-07-13  0:42             ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Rob Mueller @ 2005-07-13  0:27 UTC (permalink / raw)
  To: Lars Roland, Bron Gondwana
  Cc: linux-kernel, Jeremy Howard, Chris Mason, Vladimir Saveliev


> > We're also applying the attached patch.  There's a bug in reiserfs that
> > gets tickled by our huge MMAP usage (it's amazing what really busy
> > Cyrus daemons can do to a server, ouch).  It's fixed in generic_write,
> > so we take the few percent performance hit for something that doesn't
> > break!
>
> Interesting - When I got the problem it was on mail servers under high
> load (handling 60.000 emails pr. hour) with reiserfs as file system. I
> have seen this problem on 5 different servers so I am confident that
> it is not hardware failure.
>
> Sometimes the server load just rises and then the server dies other
> times the load rises but the kernel manages to get it back alive
> filling up syslog with messages like this

Sounds like a different issue. The patch Bron included before fixes (or at 
least reduces to the point where it fixes it for us) a problem where 
processes get stuck in D state and are unkillable. A reboot is required to 
remove them. Apparently this is a known bug in ReiserFS (see messages 
below). As noted, the same bug exists in ext3. There appears to have been 
some patches to try and fix it for both reiserfs and ext3, but I'm not sure 
if they're in the mainline kernel yet.

http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2056.html
http://hulllug.principalhosting.net/archive/index.php/t-22774.html

Rob

----- Original Message ----- 
From: "Vladimir Saveliev" <vs@namesys.com>
To: "Jeremy Howard" <jhoward@fastmail.fm>
Cc: "Hans Reiser" <reiser@namesys.com>; <reiserfs-dev@namesys.com>
Sent: Friday, October 08, 2004 4:57 PM
Subject: Re: URGENT: Need fix for problem with copy_from_user inside a 
transaction


> Hello
>
> On Thu, 2004-10-07 at 21:35, Jeremy Howard wrote:
>> On Fri, 01 Oct 2004 09:13:34 -0700, "Hans Reiser" <reiser@namesys.com>
>> said:
>> > Chris, can you comment please?
>>
>> Also... can you guys suggest any ways to minimise the problem, e.g.
>> external vs internal journal? metadata vs full journalling? changing the
>> elevator? ...
>
> Would you please check whether the attached patch for
> ./fs/reiserfs/file.c fixes the problem?
>
> Quick benchmark shows that using generic_file_write does not hurt
> reiserfs performance too much comparing to using original
> reiserfs_file_write.
>
>                            ---Sequential Output (nosync)--- ---Sequential 
> Input-- --Rnd Seek-
>                            -Per Char- --Block--- -Rewrite-- -Per 
> Char- --Block--- --04k (03)-
>                         MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU 
> K/sec %CPU   /sec %CPU
> generic_file_write      256 16690 99.7 28332 29.9 11528 16.8 12411 77.2 
> 28888 17.3  237.4  2.1
> reiserfs_file_write     256 15647 86.8 30875 22.1 10822 14.1 11631 72.1 
> 29184 16.1  250.0  2.0
>
>
>>
>> > Vladimir Saveliev wrote:
>> >
>> > >Hello
>> > >
>> > >On Fri, 2004-10-01 at 01:53, Hans Reiser wrote:
>> > >
>> > >
>> > >>vs, this is not enough of an answer to share your understanding of 
>> > >>the
>> > >>problem.  Please say much more.
>> > >>
>> > >>
>> > >
>> > >Reiserfs_write_file locks set of pages, and then tries to copy data to
>> > >them. If it is to copy data from one of pages which are locked and if
>> > >that page is not uptodate, pagefault requires to lock that page, but 
>> > >as
>> > >it is locked already - process deadlocks with itself.
>> > >
>> > >
>> > Is this when copying from one file in a formatted node to another file
>> > in that node?
>> >
>> > >As Chris said - fix is not trivial. Also, it is known that he did
>> > >already something about it, so, I thought that it would make sence to
>> > >find first what is his state at this problem.
>> > >
>> > >
>
> Content-Disposition: attachment; filename=file.c.diff2
>
> --- file.c~ 2004-10-02 12:29:33.223660850 +0400
> +++ file.c 2004-10-08 10:03:03.001561661 +0400
> @@ -1137,6 +1137,8 @@
>   return result;
>      }
>
> +    return generic_file_write(file, buf, count, ppos);
> +
>      if ( unlikely((ssize_t) count < 0 ))
>          return -EINVAL;
>

----- Original Message ----- 
From: "Chris Mason" <mason@suse.com>
To: "Hans Reiser" <reiser@namesys.com>
Cc: "Vladimir Saveliev" <vs@namesys.com>; "Oleg Drokin" 
<green@linuxhacker.ru>; "Jeremy Howard" <jhoward@fastmail.fm>; 
<reiserfs-dev@namesys.com>
Sent: Saturday, October 09, 2004 1:05 AM
Subject: Re: URGENT: Need fix for problem with copy_from_user inside 
atransaction


> On Fri, 2004-10-08 at 07:46 -0700, Hans Reiser wrote:
>> No, this is not the right answer.
>
>> >>>--- file.c~ 2004-10-02 12:29:33.223660850 +0400
>> >>>+++ file.c 2004-10-08 10:03:03.001561661 +0400
>> >>>@@ -1137,6 +1137,8 @@
>> >>> return result;
>> >>>     }
>> >>>
>> >>>+    return generic_file_write(file, buf, count, ppos);
>> >>>+
>> >>>     if ( unlikely((ssize_t) count < 0 ))
>> >>>         return -EINVAL;
>
> It's not right because ext3 has exactly the same bug.  The only real
> solution is to change the order in which things happen during
> file_write.
>
> Right now, we do this:
>
> prepare_write() allocates space, starts a transaction
> copy_from_user() can deadlock here because a transaction is running
> commit_write() end the transaction
>
> This is the same in generic_file_write and reiserfs_file_write, although
> reiserfs_file_write doesn't call reiserfs_prepare_write, it has the same
> basic structure in terms of when the transaction starts and ends.
>
> The solution is to move most of the work from reiserfs_prepare_write
> into reiserfs_commit_write.  I've made a number of different patches for
> this, all have had problems, but I'm working through it and will have a
> real tested fix.
>
> Jeremy the best thing you can do right now is to mount your filesystems
> with -o nolargeio=1, and use the temporary workarounds I sent you
> before.  The -o nolargeio=1 will reduce the amount of work we try to do
> with each pass in reiserfs_file_write (note that Vladimir's patch above
> will have very similar effects).
>
> -chris


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-13  0:27           ` Rob Mueller
@ 2005-07-13  0:42             ` Chris Mason
  2005-07-13  0:50               ` Rob Mueller
  2005-07-13  1:00               ` Chris Mason
  0 siblings, 2 replies; 12+ messages in thread
From: Chris Mason @ 2005-07-13  0:42 UTC (permalink / raw)
  To: Rob Mueller, akpm
  Cc: Lars Roland, Bron Gondwana, linux-kernel, Jeremy Howard,
	Vladimir Saveliev

On Tuesday 12 July 2005 20:27, Rob Mueller wrote:
> > > We're also applying the attached patch.  There's a bug in reiserfs that
> > > gets tickled by our huge MMAP usage (it's amazing what really busy
> > > Cyrus daemons can do to a server, ouch).  It's fixed in generic_write,
> > > so we take the few percent performance hit for something that doesn't
> > > break!
> >
> > Interesting - When I got the problem it was on mail servers under high
> > load (handling 60.000 emails pr. hour) with reiserfs as file system. I
> > have seen this problem on 5 different servers so I am confident that
> > it is not hardware failure.
> >
> > Sometimes the server load just rises and then the server dies other
> > times the load rises but the kernel manages to get it back alive
> > filling up syslog with messages like this
>
> Sounds like a different issue. The patch Bron included before fixes (or at
> least reduces to the point where it fixes it for us) a problem where
> processes get stuck in D state and are unkillable. A reboot is required to
> remove them. Apparently this is a known bug in ReiserFS (see messages
> below). As noted, the same bug exists in ext3. There appears to have been
> some patches to try and fix it for both reiserfs and ext3, but I'm not sure
> if they're in the mainline kernel yet.
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2056.html
> http://hulllug.principalhosting.net/archive/index.php/t-22774.html
>

There is a much less complex solution that I've just recently gotten working 
in the SUSE kernel.  If reiser3/ext3 don't log the inode during atime 
updates, the problem goes away.

You can solve this now by mounting with -o noatime (although that might not 
play well with cyrus, not sure).  My current patch works around this in ugly 
ways, what I plan on doing during OLS is finding out why ext3 is still 
logging the inode all the time.

For reiser3, this was to avoid kswapd having to log a bunch of inodes in 
response to memory pressure, but that was back in 2.4 when things were 
different.  We shouldn't need to do it anymore...

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-13  0:42             ` Chris Mason
@ 2005-07-13  0:50               ` Rob Mueller
  2005-07-13  1:03                 ` Chris Mason
  2005-07-13  1:00               ` Chris Mason
  1 sibling, 1 reply; 12+ messages in thread
From: Rob Mueller @ 2005-07-13  0:50 UTC (permalink / raw)
  To: Chris Mason, akpm
  Cc: Lars Roland, Bron Gondwana, linux-kernel, Vladimir Saveliev,
	Jeremy Howard

> There is a much less complex solution that I've just recently gotten 
> working
> in the SUSE kernel.  If reiser3/ext3 don't log the inode during atime
> updates, the problem goes away.
>
> You can solve this now by mounting with -o noatime (although that might 
> not
> play well with cyrus, not sure).  My current patch works around this in 
> ugly
> ways, what I plan on doing during OLS is finding out why ext3 is still
> logging the inode all the time.

Well we have always mounted our cyrus filesystems with:

noatime,nodiratime,notail

And the problem was occuring all the time with these mount options. We've 
since also added on your suggestion nolargeio=1 and used the patch Vladimir 
created. I'm not sure which of those fixed the problem, but it definitely 
has not occured since we did those last 2 things.

Are you saying that if you mount with noatime *and* use your new patch it 
will fix the problem?

What about the 2 threads linked to. Did those end up getting anywhere?

> http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2056.html
> http://hulllug.principalhosting.net/archive/index.php/t-22774.html

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-13  0:42             ` Chris Mason
  2005-07-13  0:50               ` Rob Mueller
@ 2005-07-13  1:00               ` Chris Mason
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Mason @ 2005-07-13  1:00 UTC (permalink / raw)
  To: Rob Mueller
  Cc: akpm, Lars Roland, Bron Gondwana, linux-kernel, Jeremy Howard,
	Vladimir Saveliev

On Tuesday 12 July 2005 20:42, Chris Mason wrote:

> > Sounds like a different issue. The patch Bron included before fixes (or
> > at least reduces to the point where it fixes it for us) a problem where
> > processes get stuck in D state and are unkillable. A reboot is required
> > to remove them. Apparently this is a known bug in ReiserFS (see messages
> > below). As noted, the same bug exists in ext3. There appears to have been
> > some patches to try and fix it for both reiserfs and ext3, but I'm not
> > sure if they're in the mainline kernel yet.
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2056.html
> > http://hulllug.principalhosting.net/archive/index.php/t-22774.html
>
> There is a much less complex solution that I've just recently gotten
> working in the SUSE kernel.  If reiser3/ext3 don't log the inode during
> atime updates, the problem goes away.

The sysrq is huge, and I haven't yet found the person holding the transaction 
open.  But, here's another place that starts a transaction with the mmap sem 
held, and I would guess the transaction writer is waiting on something for 
that mmap sem.

atime updates alone won't fix this one....

-chris

imapd         D F3159530     0 32412   2292         32413 32411 (NOTLB)
e1cdfdfc 00000082 00000008 f3159530 00000202 00000000 c1e5b0a0 c013b1a0
       00000034 00000202 c301b520 00000001 00004609 beca3f0b 00005566 c301b520
       c315b530 f3159530 f3159654 00000000 00000001 0000000e d9309dac 00000000
Call Trace:
 [<c013b1a0>] free_hot_cold_page+0x20/0xd0
 [<c01ac8dd>] queue_log_writer+0x5d/0x80
 [<c0114b10>] default_wake_function+0x0/0x20
 [<c01acb8a>] do_journal_begin_r+0x1ca/0x2b0
 [<c01409e0>] truncate_inode_pages+0x290/0x2b0
 [<c01ace9e>] journal_begin+0x8e/0xe0
 [<c0191061>] reiserfs_delete_inode+0x51/0xc0
 [<c01447fa>] unmap_vmas+0x14a/0x260
 [<c0191010>] reiserfs_delete_inode+0x0/0xc0
 [<c016c97d>] generic_delete_inode+0x7d/0xe0
 [<c016cb83>] iput+0x63/0x70
 [<c0169db6>] dput+0x176/0x1b0
 [<c01547cb>] __fput+0xcb/0x100
 [<c01470ff>] remove_vm_struct+0x5f/0x80
 [<c014873a>] unmap_vma_list+0x1a/0x30
 [<c0148a9f>] do_munmap+0xdf/0xf0
 [<c0148aff>] sys_munmap+0x4f/0x70
 [<c0102a15>] syscall_call+0x7/0xb


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-13  0:50               ` Rob Mueller
@ 2005-07-13  1:03                 ` Chris Mason
  2005-07-13  1:27                   ` Rob Mueller
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2005-07-13  1:03 UTC (permalink / raw)
  To: Rob Mueller
  Cc: akpm, Lars Roland, Bron Gondwana, linux-kernel, Vladimir Saveliev,
	Jeremy Howard

On Tuesday 12 July 2005 20:50, Rob Mueller wrote:

>
> Are you saying that if you mount with noatime *and* use your new patch it
> will fix the problem?
>
> What about the 2 threads linked to. Did those end up getting anywhere?

Sorry for the confusion, you're hitting the other mmap_sem -> transaction lock 
problem.  This one should be solvable with an iget so we make sure not to do 
the final unlink until after the mmap sem is dropped.

Lets see what I can do...

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.12.2 dies after 24 hours
  2005-07-13  1:03                 ` Chris Mason
@ 2005-07-13  1:27                   ` Rob Mueller
  0 siblings, 0 replies; 12+ messages in thread
From: Rob Mueller @ 2005-07-13  1:27 UTC (permalink / raw)
  To: Chris Mason
  Cc: akpm, Lars Roland, Bron Gondwana, linux-kernel, Vladimir Saveliev,
	Jeremy Howard

> Sorry for the confusion, you're hitting the other mmap_sem -> transaction 
> lock
> problem.  This one should be solvable with an iget so we make sure not to 
> do
> the final unlink until after the mmap sem is dropped.
>
> Lets see what I can do...

Oh dang.

I thought this last crash after upgrading to 2.6.12.2 was due to the IRQ 
BALANCE issue Lars suggested, but you're saying that it's actually a whole 
different bug, though similar to the previous "prepare_write ... 
copy_from_user ... commit_write" lock problem?

Is this something new between 2.6.4-mm2 and 2.6.12.2? Or would it have 
always been present, just for some reason we're hitting it in 2.6.12 now but 
weren't hitting it in 2.6.4?

I feel we're moving backwards... :(

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-07-13  1:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-12  9:26 2.6.12.2 dies after 24 hours Rob Mueller
2005-07-12  9:43 ` Lars Roland
2005-07-12 11:46   ` Rob Mueller
2005-07-12 12:13     ` Lars Roland
2005-07-12 13:51       ` Bron Gondwana
2005-07-12 16:37         ` Lars Roland
2005-07-13  0:27           ` Rob Mueller
2005-07-13  0:42             ` Chris Mason
2005-07-13  0:50               ` Rob Mueller
2005-07-13  1:03                 ` Chris Mason
2005-07-13  1:27                   ` Rob Mueller
2005-07-13  1:00               ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox