All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Nick Piggin <npiggin@kernel.dk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [patch] fix up lock order reversal in writeback
Date: Tue, 16 Nov 2010 22:30:37 -0600	[thread overview]
Message-ID: <4CE35A6D.2040906@redhat.com> (raw)
In-Reply-To: <20101116130146.GG4757@quack.suse.cz>

On 11/16/10 7:01 AM, Jan Kara wrote:
> On Tue 16-11-10 22:00:58, Nick Piggin wrote:
>> I saw a lock order warning on ext4 trigger. This should solve it.
>> Raciness shouldn't matter much, because writeback can stop just
>> after we make the test and return anyway (so the API is racy anyway).
>   Hmm, for now the fix is OK. Ultimately, we probably want to call
> writeback_inodes_sb() directly from all the callers. They all just want to
> reduce uncertainty of delayed allocation reservations by writing delayed
> data and actually wait for some of the writeback to happen before they
> retry again the allocation.

For ext4, at least, it's just best-effort.  We're not actually out of
space yet when this starts pushing.  But it helps us avoid enospc:

commit c8afb44682fcef6273e8b8eb19fab13ddd05b386
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Wed Dec 23 07:58:12 2009 -0500

    ext4: flush delalloc blocks when space is low
    
    Creating many small files in rapid succession on a small
    filesystem can lead to spurious ENOSPC; on a 104MB filesystem:
    
    for i in `seq 1 22500`; do
        echo -n > $SCRATCH_MNT/$i
        echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
    done
    
    leads to ENOSPC even though after a sync, 40% of the fs is free
    again.

    <snip>

We don't need it to be synchronous - in fact I didn't think it was ...

ext4 should probably use btrfs's new variant and just get rid of the
one I put in, for a very large system/filesystem it could end up doing
a rather insane amount of IO when the fs starts to get full.

as for the locking problems ... sorry about that!

-Eric

> Although the callers generally cannot get umount_sem because they hold
> other locks, they have the superblock well pinned so grabbing umount_sem
> makes sense mostly to make assertions happy. But as I'm thinking about it,
> trylock *is* maybe the right answer to this anyway...
> 
> So
> Acked-by: Jan Kara <jack@suse.cz>
> 
> 								Honza
>>
>> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
>>
>> Index: linux-2.6/fs/fs-writeback.c
>> ===================================================================
>> --- linux-2.6.orig/fs/fs-writeback.c	2010-11-16 21:44:32.000000000 +1100
>> +++ linux-2.6/fs/fs-writeback.c	2010-11-16 21:49:37.000000000 +1100
>> @@ -1125,16 +1125,20 @@ EXPORT_SYMBOL(writeback_inodes_sb);
>>   *
>>   * Invoke writeback_inodes_sb if no writeback is currently underway.
>>   * Returns 1 if writeback was started, 0 if not.
>> + *
>> + * May be called inside i_lock. May not start writeback if locks cannot
>> + * be acquired.
>>   */
>>  int writeback_inodes_sb_if_idle(struct super_block *sb)
>>  {
>>  	if (!writeback_in_progress(sb->s_bdi)) {
>> -		down_read(&sb->s_umount);
>> -		writeback_inodes_sb(sb);
>> -		up_read(&sb->s_umount);
>> -		return 1;
>> -	} else
>> -		return 0;
>> +		if (down_read_trylock(&sb->s_umount)) {
>> +			writeback_inodes_sb(sb);
>> +			up_read(&sb->s_umount);
>> +			return 1;
>> +		}
>> +	}
>> +	return 0;
>>  }
>>  EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
>>  
>> @@ -1145,17 +1149,21 @@ EXPORT_SYMBOL(writeback_inodes_sb_if_idl
>>   *
>>   * Invoke writeback_inodes_sb if no writeback is currently underway.
>>   * Returns 1 if writeback was started, 0 if not.
>> + *
>> + * May be called inside i_lock. May not start writeback if locks cannot
>> + * be acquired.
>>   */
>>  int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
>>  				   unsigned long nr)
>>  {
>>  	if (!writeback_in_progress(sb->s_bdi)) {
>> -		down_read(&sb->s_umount);
>> -		writeback_inodes_sb_nr(sb, nr);
>> -		up_read(&sb->s_umount);
>> -		return 1;
>> -	} else
>> -		return 0;
>> +		if (down_read_trylock(&sb->s_umount)) {
>> +			writeback_inodes_sb_nr(sb, nr);
>> +			up_read(&sb->s_umount);
>> +			return 1;
>> +		}
>> +	}
>> +	return 0;
>>  }
>>  EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
>>  
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2010-11-17  4:30 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-16 11:00 [patch] fix up lock order reversal in writeback Nick Piggin
2010-11-16 13:01 ` Jan Kara
2010-11-17  4:30   ` Eric Sandeen [this message]
2010-11-17  4:38     ` Nick Piggin
2010-11-17  5:05       ` Eric Sandeen
2010-11-17  6:10         ` Nick Piggin
2010-11-18  3:06           ` Ted Ts'o
2010-11-18  3:29             ` Andrew Morton
2010-11-18  6:00               ` Nick Piggin
2010-11-18  6:28                 ` Andrew Morton
2010-11-18  8:18                   ` Nick Piggin
2010-11-18 10:51                     ` Theodore Tso
2010-11-18 17:58                     ` Andrew Morton
2010-11-19  5:10                       ` Nick Piggin
2010-11-19 12:07                         ` Theodore Tso
2010-11-18 14:55                   ` Eric Sandeen
2010-11-18 17:10                     ` Andrew Morton
2010-11-18 18:04                       ` Eric Sandeen
2010-11-18 18:24                         ` Eric Sandeen
2010-11-18 18:39                           ` Chris Mason
2010-11-18 18:36                         ` Andrew Morton
2010-11-18 18:51                           ` Chris Mason
2010-11-18 20:22                             ` Andrew Morton
2010-11-18 20:36                               ` Chris Mason
2010-11-18 19:02                           ` Eric Sandeen
2010-11-18 20:17                             ` Andrew Morton
2010-11-18 18:33                   ` Chris Mason
2010-11-18 23:58                     ` Jan Kara
2010-11-19  0:45                   ` Jan Kara
2010-11-19  5:16                     ` Nick Piggin
2010-11-22 18:16                       ` Jan Kara
2010-11-23  8:07                         ` Nick Piggin
2010-11-23 13:32                           ` Jan Kara
2010-11-23  8:15                         ` Nick Piggin
2010-11-18 18:53             ` Al Viro
2010-11-18  3:18           ` Eric Sandeen
2010-11-22 23:43             ` Andrew Morton
2010-11-16 20:32 ` Andrew Morton
2010-11-17  3:56   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE35A6D.2040906@redhat.com \
    --to=sandeen@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.