Re: [PATCH v1 3/5] btrfs: initial readahead code and prototypes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Arne Jansen <sensille@gmx.net>
To: miaox@cn.fujitsu.com
Cc: chris.mason@oracle.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v1 3/5] btrfs: initial readahead code and prototypes
Date: Thu, 26 May 2011 13:02:59 +0200	[thread overview]
Message-ID: <4DDE3363.8020409@gmx.net> (raw)
In-Reply-To: <4DDE2FAB.5060906@cn.fujitsu.com>

On 26.05.2011 12:47, Miao Xie wrote:
> On thu, 26 May 2011 12:14:21 +0200, David Sterba wrote:
>> Hi,
>>
>> On Mon, May 23, 2011 at 02:59:06PM +0200, Arne Jansen wrote:
>>> +static struct reada_zone *reada_find_zone(struct btrfs_fs_info *fs_info,
>>> +					  struct btrfs_device *dev, u64 logical,
>>> +					  struct btrfs_multi_bio *multi)
>>> +{
>>> +	int ret;
>>> +	int looped = 0;
>>> +	struct reada_zone *zone;
>>> +	struct btrfs_block_group_cache *cache = NULL;
>>> +	u64 start;
>>> +	u64 end;
>>> +	int i;
>>> +
>>> +again:
>>> +	zone = NULL;
>>> +	spin_lock(&fs_info->reada_lock);
>>> +	ret = radix_tree_gang_lookup(&dev->reada_zones, (void **)&zone,
>>> +				     logical >> PAGE_CACHE_SHIFT, 1);
>>> +	if (ret == 1)
>>> +		kref_get(&zone->refcnt);
>>> +	spin_unlock(&fs_info->reada_lock);
>>> +
>>> +	if (ret == 1) {
>>> +		if (logical >= zone->start && logical < zone->end)
>>> +			return zone;
>>> +		spin_lock(&fs_info->reada_lock);
>>> +		reada_zone_put(zone);
>>> +		spin_unlock(&fs_info->reada_lock);
>>> +	}
>>> +
>>> +	if (looped)
>>> +		return NULL;
>>> +
>>> +	cache = btrfs_lookup_block_group(fs_info, logical);
>>> +	if (!cache)
>>> +		return NULL;
>>> +
>>> +	start = cache->key.objectid;
>>> +	end = start + cache->key.offset - 1;
>>> +	btrfs_put_block_group(cache);
>>> +
>>> +	zone = kzalloc(sizeof(*zone), GFP_NOFS);
>>> +	if (!zone)
>>> +		return NULL;
>>> +
>>> +	zone->start = start;
>>> +	zone->end = end;
>>> +	INIT_LIST_HEAD(&zone->list);
>>> +	spin_lock_init(&zone->lock);
>>> +	zone->locked = 0;
>>> +	kref_init(&zone->refcnt);
>>> +	zone->elems = 0;
>>> +	zone->device = dev; /* our device always sits at index 0 */
>>> +	for (i = 0; i < multi->num_stripes; ++i) {
>>> +		/* bounds have already been checked */
>>> +		zone->devs[i] = multi->stripes[i].dev;
>>> +	}
>>> +	zone->ndevs = multi->num_stripes;
>>> +
>>> +	spin_lock(&fs_info->reada_lock);
>>> +	ret = radix_tree_insert(&dev->reada_zones,
>>> +				(unsigned long)zone->end >> PAGE_CACHE_SHIFT,
>>> +				zone);
>>
>> this can sleep inside a spinlock, you initialize the radix tree with
>> GFP_NOFS, which allows __GFP_WAIT.
>>
>> Options:
>> 1) use GFP_ATOMIC in radix tree init flags
>> 2) do the radix_tree_preload/radix_tree_preload_end, GFP_NOFS outside of the
>> locked section is ok but __GFP_WAIT has to be masked out (else radix
>> tree insert will not use the preloaded node)
>> 3) unmask __GFP_WAIT from radix tree init flags
>>
>> I'd go for 3, as the atomic context is not required, and is easier
>> than 2 to implement.
> 
> I like the second one, because it is the general way to fix this problem.

I can't use the second one, as I have code to insert into 3 trees inside
one spin_lock, and preload can preload one for one insertion. My
intention was to use GFP_ATOMIC, don't know how I ended up NOFS
instead. If unmasking GFP_WAIT instead is sufficient, I'd prefer that
solution, too.

> 
> BTW: I think we can use RCU to protect the radix tree on the read side.
> Arne, how do you think about?

I decided against RCU, because inside the same lock I also take a ref on
the structure.
The data structures and locking are already quite complex, so I tried to
keep it as simple as possible until profiling shows that this is a
problem.
Mainly it works with a single global (well, per btrfs) lock, but
splitting it is hard and will normally lead to needing more
lock/unlocks, so I'd prefer to only do it if it doesn't scale.

Thanks,
Arne

> 
> Thanks
> Miao
> 
>>
>>> +	spin_unlock(&fs_info->reada_lock);
>>> +
>>> +	if (ret) {
>>> +		kfree(zone);
>>> +		looped = 1;
>>> +		goto again;
>>> +	}
>>> +
>>> +	return zone;
>>> +}
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

next prev parent reply	other threads:[~2011-05-26 11:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-23 12:59 [PATCH v1 0/5] btrfs: generic readeahead interface Arne Jansen
2011-05-23 12:59 ` [PATCH v1 1/5] btrfs: add READAHEAD extent state Arne Jansen
2011-05-23 12:59 ` [PATCH v1 2/5] btrfs: state information for readahead Arne Jansen
2011-05-25  5:22   ` liubo
2011-05-24  6:48     ` Arne Jansen
2011-05-23 12:59 ` [PATCH v1 3/5] btrfs: initial readahead code and prototypes Arne Jansen
2011-05-26 10:14   ` David Sterba
2011-05-26 10:47     ` Miao Xie
2011-05-26 11:02       ` Arne Jansen [this message]
2011-05-23 12:59 ` [PATCH v1 4/5] btrfs: hooks for readahead Arne Jansen
2011-05-23 12:59 ` [PATCH v1 5/5] btrfs: test ioctl " Arne Jansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DDE3363.8020409@gmx.net \
    --to=sensille@gmx.net \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=miaox@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.