From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arne Jansen Subject: Re: [PATCH v1 3/5] btrfs: initial readahead code and prototypes Date: Thu, 26 May 2011 13:02:59 +0200 Message-ID: <4DDE3363.8020409@gmx.net> References: <320563549be689389d99d47d396290547db94982.1306154794.git.sensille@gmx.net> <20110526101421.GX12709@twin.jikos.cz> <4DDE2FAB.5060906@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: chris.mason@oracle.com, linux-btrfs@vger.kernel.org To: miaox@cn.fujitsu.com Return-path: In-Reply-To: <4DDE2FAB.5060906@cn.fujitsu.com> List-ID: On 26.05.2011 12:47, Miao Xie wrote: > On thu, 26 May 2011 12:14:21 +0200, David Sterba wrote: >> Hi, >> >> On Mon, May 23, 2011 at 02:59:06PM +0200, Arne Jansen wrote: >>> +static struct reada_zone *reada_find_zone(struct btrfs_fs_info *fs_info, >>> + struct btrfs_device *dev, u64 logical, >>> + struct btrfs_multi_bio *multi) >>> +{ >>> + int ret; >>> + int looped = 0; >>> + struct reada_zone *zone; >>> + struct btrfs_block_group_cache *cache = NULL; >>> + u64 start; >>> + u64 end; >>> + int i; >>> + >>> +again: >>> + zone = NULL; >>> + spin_lock(&fs_info->reada_lock); >>> + ret = radix_tree_gang_lookup(&dev->reada_zones, (void **)&zone, >>> + logical >> PAGE_CACHE_SHIFT, 1); >>> + if (ret == 1) >>> + kref_get(&zone->refcnt); >>> + spin_unlock(&fs_info->reada_lock); >>> + >>> + if (ret == 1) { >>> + if (logical >= zone->start && logical < zone->end) >>> + return zone; >>> + spin_lock(&fs_info->reada_lock); >>> + reada_zone_put(zone); >>> + spin_unlock(&fs_info->reada_lock); >>> + } >>> + >>> + if (looped) >>> + return NULL; >>> + >>> + cache = btrfs_lookup_block_group(fs_info, logical); >>> + if (!cache) >>> + return NULL; >>> + >>> + start = cache->key.objectid; >>> + end = start + cache->key.offset - 1; >>> + btrfs_put_block_group(cache); >>> + >>> + zone = kzalloc(sizeof(*zone), GFP_NOFS); >>> + if (!zone) >>> + return NULL; >>> + >>> + zone->start = start; >>> + zone->end = end; >>> + INIT_LIST_HEAD(&zone->list); >>> + spin_lock_init(&zone->lock); >>> + zone->locked = 0; >>> + kref_init(&zone->refcnt); >>> + zone->elems = 0; >>> + zone->device = dev; /* our device always sits at index 0 */ >>> + for (i = 0; i < multi->num_stripes; ++i) { >>> + /* bounds have already been checked */ >>> + zone->devs[i] = multi->stripes[i].dev; >>> + } >>> + zone->ndevs = multi->num_stripes; >>> + >>> + spin_lock(&fs_info->reada_lock); >>> + ret = radix_tree_insert(&dev->reada_zones, >>> + (unsigned long)zone->end >> PAGE_CACHE_SHIFT, >>> + zone); >> >> this can sleep inside a spinlock, you initialize the radix tree with >> GFP_NOFS, which allows __GFP_WAIT. >> >> Options: >> 1) use GFP_ATOMIC in radix tree init flags >> 2) do the radix_tree_preload/radix_tree_preload_end, GFP_NOFS outside of the >> locked section is ok but __GFP_WAIT has to be masked out (else radix >> tree insert will not use the preloaded node) >> 3) unmask __GFP_WAIT from radix tree init flags >> >> I'd go for 3, as the atomic context is not required, and is easier >> than 2 to implement. > > I like the second one, because it is the general way to fix this problem. I can't use the second one, as I have code to insert into 3 trees inside one spin_lock, and preload can preload one for one insertion. My intention was to use GFP_ATOMIC, don't know how I ended up NOFS instead. If unmasking GFP_WAIT instead is sufficient, I'd prefer that solution, too. > > BTW: I think we can use RCU to protect the radix tree on the read side. > Arne, how do you think about? I decided against RCU, because inside the same lock I also take a ref on the structure. The data structures and locking are already quite complex, so I tried to keep it as simple as possible until profiling shows that this is a problem. Mainly it works with a single global (well, per btrfs) lock, but splitting it is hard and will normally lead to needing more lock/unlocks, so I'd prefer to only do it if it doesn't scale. Thanks, Arne > > Thanks > Miao > >> >>> + spin_unlock(&fs_info->reada_lock); >>> + >>> + if (ret) { >>> + kfree(zone); >>> + looped = 1; >>> + goto again; >>> + } >>> + >>> + return zone; >>> +} >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >