Re: [PATCH v1 3/5] btrfs: initial readahead code and prototypes

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Arne Jansen <sensille@gmx.net>
To: miaox@cn.fujitsu.com
Cc: chris.mason@oracle.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v1 3/5] btrfs: initial readahead code and prototypes
Date: Thu, 26 May 2011 13:02:59 +0200	[thread overview]
Message-ID: <4DDE3363.8020409@gmx.net> (raw)
In-Reply-To: <4DDE2FAB.5060906@cn.fujitsu.com>

On 26.05.2011 12:47, Miao Xie wrote:
> On thu, 26 May 2011 12:14:21 +0200, David Sterba wrote:
>> Hi,
>>
>> On Mon, May 23, 2011 at 02:59:06PM +0200, Arne Jansen wrote:
>>> +static struct reada_zone *reada_find_zone(struct btrfs_fs_info *fs_info,
>>> +					  struct btrfs_device *dev, u64 logical,
>>> +					  struct btrfs_multi_bio *multi)
>>> +{
>>> +	int ret;
>>> +	int looped = 0;
>>> +	struct reada_zone *zone;
>>> +	struct btrfs_block_group_cache *cache = NULL;
>>> +	u64 start;
>>> +	u64 end;
>>> +	int i;
>>> +
>>> +again:
>>> +	zone = NULL;
>>> +	spin_lock(&fs_info->reada_lock);
>>> +	ret = radix_tree_gang_lookup(&dev->reada_zones, (void **)&zone,
>>> +				     logical >> PAGE_CACHE_SHIFT, 1);
>>> +	if (ret == 1)
>>> +		kref_get(&zone->refcnt);
>>> +	spin_unlock(&fs_info->reada_lock);
>>> +
>>> +	if (ret == 1) {
>>> +		if (logical >= zone->start && logical < zone->end)
>>> +			return zone;
>>> +		spin_lock(&fs_info->reada_lock);
>>> +		reada_zone_put(zone);
>>> +		spin_unlock(&fs_info->reada_lock);
>>> +	}
>>> +
>>> +	if (looped)
>>> +		return NULL;
>>> +
>>> +	cache = btrfs_lookup_block_group(fs_info, logical);
>>> +	if (!cache)
>>> +		return NULL;
>>> +
>>> +	start = cache->key.objectid;
>>> +	end = start + cache->key.offset - 1;
>>> +	btrfs_put_block_group(cache);
>>> +
>>> +	zone = kzalloc(sizeof(*zone), GFP_NOFS);
>>> +	if (!zone)
>>> +		return NULL;
>>> +
>>> +	zone->start = start;
>>> +	zone->end = end;
>>> +	INIT_LIST_HEAD(&zone->list);
>>> +	spin_lock_init(&zone->lock);
>>> +	zone->locked = 0;
>>> +	kref_init(&zone->refcnt);
>>> +	zone->elems = 0;
>>> +	zone->device = dev; /* our device always sits at index 0 */
>>> +	for (i = 0; i < multi->num_stripes; ++i) {
>>> +		/* bounds have already been checked */
>>> +		zone->devs[i] = multi->stripes[i].dev;
>>> +	}
>>> +	zone->ndevs = multi->num_stripes;
>>> +
>>> +	spin_lock(&fs_info->reada_lock);
>>> +	ret = radix_tree_insert(&dev->reada_zones,
>>> +				(unsigned long)zone->end >> PAGE_CACHE_SHIFT,
>>> +				zone);
>>
>> this can sleep inside a spinlock, you initialize the radix tree with
>> GFP_NOFS, which allows __GFP_WAIT.
>>
>> Options:
>> 1) use GFP_ATOMIC in radix tree init flags
>> 2) do the radix_tree_preload/radix_tree_preload_end, GFP_NOFS outside of the
>> locked section is ok but __GFP_WAIT has to be masked out (else radix
>> tree insert will not use the preloaded node)
>> 3) unmask __GFP_WAIT from radix tree init flags
>>
>> I'd go for 3, as the atomic context is not required, and is easier
>> than 2 to implement.
> 
> I like the second one, because it is the general way to fix this problem.

I can't use the second one, as I have code to insert into 3 trees inside
one spin_lock, and preload can preload one for one insertion. My
intention was to use GFP_ATOMIC, don't know how I ended up NOFS
instead. If unmasking GFP_WAIT instead is sufficient, I'd prefer that
solution, too.

> 
> BTW: I think we can use RCU to protect the radix tree on the read side.
> Arne, how do you think about?

I decided against RCU, because inside the same lock I also take a ref on
the structure.
The data structures and locking are already quite complex, so I tried to
keep it as simple as possible until profiling shows that this is a
problem.
Mainly it works with a single global (well, per btrfs) lock, but
splitting it is hard and will normally lead to needing more
lock/unlocks, so I'd prefer to only do it if it doesn't scale.

Thanks,
Arne

> 
> Thanks
> Miao
> 
>>
>>> +	spin_unlock(&fs_info->reada_lock);
>>> +
>>> +	if (ret) {
>>> +		kfree(zone);
>>> +		looped = 1;
>>> +		goto again;
>>> +	}
>>> +
>>> +	return zone;
>>> +}
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

next prev parent reply	other threads:[~2011-05-26 11:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-23 12:59 [PATCH v1 0/5] btrfs: generic readeahead interface Arne Jansen
2011-05-23 12:59 ` [PATCH v1 1/5] btrfs: add READAHEAD extent state Arne Jansen
2011-05-23 12:59 ` [PATCH v1 2/5] btrfs: state information for readahead Arne Jansen
2011-05-25  5:22   ` liubo
2011-05-24  6:48     ` Arne Jansen
2011-05-23 12:59 ` [PATCH v1 3/5] btrfs: initial readahead code and prototypes Arne Jansen
2011-05-26 10:14   ` David Sterba
2011-05-26 10:47     ` Miao Xie
2011-05-26 11:02       ` Arne Jansen [this message]
2011-05-23 12:59 ` [PATCH v1 4/5] btrfs: hooks for readahead Arne Jansen
2011-05-23 12:59 ` [PATCH v1 5/5] btrfs: test ioctl " Arne Jansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DDE3363.8020409@gmx.net \
    --to=sensille@gmx.net \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=miaox@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).