public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@digeo.com>
Cc: torvalds@transmeta.com, mikulas@artax.karlin.mff.cuni.cz,
	pavel@suse.cz, linux-kernel@vger.kernel.org
Subject: Re: 2.0, 2.2, 2.4, 2.5: fsync buffer race
Date: Fri, 14 Mar 2003 07:42:55 +0100	[thread overview]
Message-ID: <20030314064255.GE32250@dualathlon.random> (raw)
In-Reply-To: <20030311135831.GA1501@dualathlon.random>

Hi!

> mutt          D 00000292     0 14627  24890                     (NOTLB)
> Call Trace:    [__wait_on_buffer+86/144] [ext3_free_inode+391/1248] [ll_rw_block+309/464] [ext3_find_entry+828/864] [ext3_lookup+65/208]
>   [lookup_hash+194/288] [sys_unlink+139/288] [system_call+51/56]
> tee           D ECD560C0     0   111  11445                 110 (NOTLB)
> Call Trace:    [journal_dirty_metadata+420/608] [__wait_on_buffer+86/144] [__cleanup_transaction+362/368] [journal_cancel_revoke+91/448] [log_do_checkpoint+299/448]
>   [n_tty_receive_buf+400/1808] [journal_dirty_metadata+420/608] [n_tty_receive_buf+400/1808] [journal_dirty_metadata+420/608] [ext3_do_update_inode+375/992] [__jbd_kmalloc+35/176]
>   [log_wait_for_space+130/144] [start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [ext3_prepare_write+567/576] [write_chan+335/512]
>   [generic_file_write_nolock+1023/2032] [pipe_wait+125/160] [generic_file_write+77/112] [ext3_file_write+57/224] [sys_write+163/336] [system_call+51/56]
> python        D 00000180     0 21334    112 21336   21335       (NOTLB)
> Call Trace:    [start_request+409/544] [__down+105/192] [__down_failed+8/12] [.text.lock.checkpoint+15/173] [start_this_handle+191/864]
>   [new_handle+75/112] [journal_start+165/208] [ext3_lookup+0/208] [ext3_create+300/320] [vfs_create+153/304] [open_namei+1301/1456]
>   [filp_open+62/112] [sys_open+83/192] [system_call+51/56]
> python        D E2EB302C  2388 21335    112 21337         21334 (NOTLB)
> Call Trace:    [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
>   [sys_open+83/192] [system_call+51/56]
> procmail      D E51AC016     0 21378      1               19102 (NOTLB)
> Call Trace:    [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
>   [sys_open+83/192] [system_call+51/56]
> python        D C01186FF     0 21470    107         21471       (NOTLB)
> Call Trace:    [do_page_fault+463/1530] [__down+105/192] [__jbd_kmalloc+35/176] [__down_failed+8/12] [.text.lock.checkpoint+15/173]
>   [start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [do_wp_page+416/1088] [ext3_dirty_inode+336/368] [__mark_inode_dirty+195/208]
>   [update_inode_times+39/64] [generic_file_write_nolock+630/2032] [generic_file_write+77/112] [ext3_file_write+57/224] [sys_write+163/336] [system_call+51/56]
> python        D C01186FF     0 21471    107               21470 (NOTLB)
> Call Trace:    [do_page_fault+463/1530] [__down+105/192] [__jbd_kmalloc+35/176] [__down_failed+8/12] [.text.lock.checkpoint+15/173]
>   [start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [do_wp_page+416/1088] [ext3_dirty_inode+336/368] [__mark_inode_dirty+195/208]
>   [update_inode_times+39/64] [generic_file_write_nolock+630/2032] [generic_file_write+77/112] [ext3_file_write+57/224] [sys_write+163/336] [system_call+51/56]
> python        D 000001E0     0 21732  21731                     (NOTLB)
> Call Trace:    [bread+32/160] [__down+105/192] [__jbd_kmalloc+35/176] [__down_failed+8/12] [.text.lock.checkpoint+15/173]
>   [start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [dput+33/320] [ext3_unlink+440/448] [vfs_unlink+233/480]
>   [sys_unlink+183/288] [system_call+51/56]
> procmail      D D0E13016     0 21914  21913                     (NOTLB)
> Call Trace:    [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
>   [sys_open+83/192] [system_call+51/56]
> procmail      D E39F3016     0 21999  21998                     (NOTLB)
> Call Trace:    [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
>   [sys_open+83/192] [system_call+51/56]

never mind, I found the real bug and it's not the fsync fix or in ext3.

This other more recent cleanedup trace made the real bug more obvious
since less stuff was involved on the fs side (because it wasn't an fs
problem after all):

gzip          D CEAE5740  1616  1234   1229                     (NOTLB)
Call Trace:    [ext3_new_block+1062/2448] [do_schedule+284/416] [__wait_on_buffer+84/144] [__cleanup_transaction+252/256] [log_do_checkpoint+286/432]
  [ide_build_sglist+156/512] [__ide_dma_begin+57/80] [__ide_dma_count+22/32] [__ide_dma_write+326/352] [dma_timer_expiry+0/224] [do_rw_disk+884/1712]
  [context_switch+131/320] [ide_wait_stat+159/304] [start_request+409/544] [__jbd_kmalloc+35/176] [log_wait_for_space+108/128] [start_this_handle+184/848]
  [new_handle+75/112] [journal_start+165/208] [__alloc_pages+100/624] [ext3_prepare_write+336/352] [lru_cache_add+99/112] [generic_file_write_nolock+999/1856]
  [generic_file_write+76/112] [ext3_file_write+57/224] [sys_write+163/304] [system_call+51/56]
shutdown      D DFE7D940     0  2308      1  2309          2169 (NOTLB)
Call Trace:    [do_schedule+284/416] [__wait_on_buffer+84/144] [__refile_buffer+86/112] [wait_for_buffers+172/176] [wait_for_locked_buffers+35/48]
  [sync_buffers+106/112] [fsync_dev+68/80] [sys_sync+15/32] [system_call+51/56]
sshd          D D0CC93DC     0  2338    601  2339    2340  1226 (NOTLB)
Call Trace:    [do_schedule+284/416] [__down+90/160] [__jbd_kmalloc+35/176] [__down_failed+8/12] [.text.lock.checkpoint+15/109]
  [start_this_handle+184/848] [new_handle+75/112] [journal_start+165/208] [ext3_dirty_inode+276/288] [__mark_inode_dirty+150/160] [update_atime+94/96]
  [generic_file_read+174/352] [file_read_actor+0/144] [sys_read+163/304] [sys_fstat64+73/128] [system_call+51/56]
sync          D DFF858C0    80  2422   2394                     (NOTLB)
Call Trace:    [do_schedule+284/416] [__wait_on_buffer+84/144] [wait_for_buffers+172/176] [wait_for_locked_buffers+35/48] [sync_buffers+106/112]
  [fsync_dev+68/80] [sys_sync+15/32] [system_call+51/56]

sshd waits for gzip on the journal lock, gzip and shutdown waits on a
buffer that was supposed to be submitted by `sync`, but really `sync`
never submitted it, it only locked it. This bug is present only in
2.4.21pre4aa3 (all previous didn't had this problem). This was extremely
hard to trigger, so it's not surprising few people noticed it,
expecially on fast machines. The lowlatency improvements in
write_some_buffers zeroed out 'count' at the second pass, and so we lost
track of those bh if need_resched was set during the write_some_buffers
pass. this will be fixed in my next tree that I can now release
confortably back in rock solid state with also the fsync-race fix
re-added ;)

here the untested fix against 2.4.21pre4aa3 that I will include in
2.4.21pre5aa1:

--- x/fs/buffer.c.~1~	2003-03-13 06:15:29.000000000 +0100
+++ x/fs/buffer.c	2003-03-14 07:36:40.000000000 +0100
@@ -236,10 +236,10 @@ static int write_some_buffers(kdev_t dev
 	unsigned int count;
 	int nr, num_sched = 0;
 
+	count = 0;
  restart:
 	next = lru_list[BUF_DIRTY];
 	nr = nr_buffers_type[BUF_DIRTY];
-	count = 0;
 	while (next && --nr >= 0) {
 		struct buffer_head * bh = next;
 		next = bh->b_next_free;

Sorry for the noise.

Andrea

      reply	other threads:[~2003-03-14  6:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-02 23:32 2.0, 2.2, 2.4, 2.5: fsync buffer race Mikulas Patocka
2003-02-03  0:00 ` Andrew Morton
2003-02-03  1:13   ` Mikulas Patocka
2003-02-03  1:20     ` Andrew Morton
2003-02-03  9:29       ` Mikulas Patocka
2003-02-04 23:16   ` Pavel Machek
2003-02-05 15:13     ` Mikulas Patocka
2003-02-10 13:07     ` Andrea Arcangeli
2003-02-10 16:28       ` Mikulas Patocka
2003-02-10 16:57         ` Linus Torvalds
2003-02-10 20:40           ` Andrew Morton
2003-02-10 21:18             ` Andrea Arcangeli
2003-02-10 21:44               ` Andrew Morton
2003-02-10 21:59                 ` Andrea Arcangeli
2003-03-11 13:58                   ` Andrea Arcangeli
2003-03-14  6:42                     ` Andrea Arcangeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030314064255.GE32250@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=pavel@suse.cz \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox