reiserfs-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>
Cc: reiserfs-devel@vger.kernel.org
Subject: Re: [RFC] [PATCH] reiser4: discard support: initial implementation using extent lists.
Date: Mon, 05 May 2014 00:03:44 +0200	[thread overview]
Message-ID: <5366B940.50304@gmail.com> (raw)
In-Reply-To: <1399227088-13665-1-git-send-email-intelfx100@gmail.com>

That's great!
At the first sight looks OK. I'll take a look at this more carefully..


On 05/04/2014 08:11 PM, Ivan Shapovalov wrote:
> - extents are pre-merged when adding, if possible
>    (if an extent is adjacent to the last one)
> - extents are sorted and finally merged at commit time
> - allocator's check_blocks() has been adapted to return int instead of
>    asserting success
> - -E_REPEAT loop has been used to work around the fact that
>    blkdev_issue_discard() sleeps
>
> Discard functionality is now enabled unconditionally.

There is a kind of mount options, which install a flag to reiser4 
super-block.
Take a look at "dont_load_bitmap" in init_super.c as an example..

> Issuing of discard requests should stop (untested) after a first failure
> encountered; discard failure does not fail atom commit.
>
> Kernel-reported granularity (erase unit size) is now used.
> It could make sense to allow manual override, because sometimes (e. g. here)
> granularity is reported as 1 sector.
>
> Processing of each erase unit makes its own bitmap check. This implementation
> is suboptimal in cases when granularity is less than the block size (the
> same block is checked multiple times, once per each erase unit).
>
> And it floods the kernel log.
>
> Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
> ---
>
> It seems not to crash the fs.
> BTW, what is reiser4 assert policy?


The more invariants you check - the better.
Make sure that debugging code doesn't modify reiser4 objects,
and doesn't take resources (locks, loads, etc.)

Good work! I'll purchase an SSD drive finally ;)

Thanks,
Edward.


>   Should I assert-check each and every
> pointer and every invariant condition I can invent?
>
>   fs/reiser4/Makefile                       |   1 +
>   fs/reiser4/block_alloc.c                  |  38 ++-
>   fs/reiser4/block_alloc.h                  |  14 +-
>   fs/reiser4/dformat.h                      |   2 +
>   fs/reiser4/discard.c                      | 434 ++++++++++++++++++++++++++++++
>   fs/reiser4/discard.h                      |  38 +++
>   fs/reiser4/forward.h                      |   1 +
>   fs/reiser4/plugin/space/bitmap.c          |  77 ++++--
>   fs/reiser4/plugin/space/bitmap.h          |   2 +-
>   fs/reiser4/plugin/space/space_allocator.h |   4 +-
>   fs/reiser4/txnmgr.c                       |  21 ++
>   fs/reiser4/txnmgr.h                       |   7 +
>   fs/reiser4/znode.c                        |   9 +-
>   13 files changed, 594 insertions(+), 54 deletions(-)
>   create mode 100644 fs/reiser4/discard.c
>   create mode 100644 fs/reiser4/discard.h
>
> diff --git a/fs/reiser4/Makefile b/fs/reiser4/Makefile
> index 746aa32f..f992cf9 100644
> --- a/fs/reiser4/Makefile
> +++ b/fs/reiser4/Makefile
> @@ -46,6 +46,7 @@ reiser4-y := \
>   		   status_flags.o \
>   		   init_super.o \
>   		   safe_link.o \
> +		   discard.o \
>              \
>   		   plugin/plugin.o \
>   		   plugin/plugin_set.o \
> diff --git a/fs/reiser4/block_alloc.c b/fs/reiser4/block_alloc.c
> index 6aa8418..293349f 100644
> --- a/fs/reiser4/block_alloc.c
> +++ b/fs/reiser4/block_alloc.c
> @@ -9,6 +9,7 @@ reiser4/README */
>   #include "block_alloc.h"
>   #include "tree.h"
>   #include "super.h"
> +#include "discard.h"
>   
>   #include <linux/types.h>	/* for __u??  */
>   #include <linux/fs.h>		/* for struct super_block  */
> @@ -939,26 +940,14 @@ static void used2free(reiser4_super_info_data * sbinfo, __u64 count)
>   	spin_unlock_reiser4_super(sbinfo);
>   }
>   
> -#if REISER4_DEBUG
> -
>   /* check "allocated" state of given block range */
> -static void
> +int
>   reiser4_check_blocks(const reiser4_block_nr * start,
>   		     const reiser4_block_nr * len, int desired)
>   {
> -	sa_check_blocks(start, len, desired);
> +	return sa_check_blocks(start, len, desired);
>   }
>   
> -/* check "allocated" state of given block */
> -void reiser4_check_block(const reiser4_block_nr * block, int desired)
> -{
> -	const reiser4_block_nr one = 1;
> -
> -	reiser4_check_blocks(block, &one, desired);
> -}
> -
> -#endif
> -
>   /* Blocks deallocation function may do an actual deallocation through space
>      plugin allocation or store deleted block numbers in atom's delete_set data
>      structure depend on @defer parameter. */
> @@ -981,6 +970,7 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
>   	int ret;
>   	reiser4_context *ctx;
>   	reiser4_super_info_data *sbinfo;
> +	discard_set_entry *dsep = NULL;
>   
>   	ctx = get_current_context();
>   	sbinfo = get_super_private(ctx->super);
> @@ -995,6 +985,26 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
>   		spin_unlock_reiser4_super(sbinfo);
>   	}
>   
> +	/* XXX: store block numbers for further discard */
> +	do {
> +		atom = get_current_atom_locked();
> +		assert("intelfx-1", atom != NULL);
> +
> +		ret = discard_set_add_extent(atom, &dsep, start, len);
> +
> +		if (ret == -ENOMEM) {
> +		    warning("intelfx-2", "could not add extent (start %llu len %llu) to discard set: %d",
> +			    (unsigned long long) *start, (unsigned long long) *len, ret);
> +		    break;
> +		}
> +
> +		/* This loop might spin at most two times */
> +	} while (ret == -E_REPEAT);
> +
> +	assert("intelfx-3", ret == 0 || ret == -ENOMEM);
> +
> +	spin_unlock_atom(atom);
> +
>   	if (flags & BA_DEFER) {
>   		blocknr_set_entry *bsep = NULL;
>   
> diff --git a/fs/reiser4/block_alloc.h b/fs/reiser4/block_alloc.h
> index 689efc1..a4e98af 100644
> --- a/fs/reiser4/block_alloc.h
> +++ b/fs/reiser4/block_alloc.h
> @@ -150,15 +150,15 @@ extern void cluster_reserved2free(int count);
>   
>   extern int reiser4_check_block_counters(const struct super_block *);
>   
> -#if REISER4_DEBUG
>   
> -extern void reiser4_check_block(const reiser4_block_nr *, int);
> +extern int reiser4_check_blocks(const reiser4_block_nr *start,
> +                                const reiser4_block_nr *len, int desired);
>   
> -#else
> -
> -#  define reiser4_check_block(beg, val)        noop
> -
> -#endif
> +static inline int reiser4_check_block(const reiser4_block_nr *start,
> +                                      int desired)
> +{
> +	return reiser4_check_blocks(start, NULL, desired);
> +}
>   
>   extern int reiser4_pre_commit_hook(void);
>   extern void reiser4_post_commit_hook(void);
> diff --git a/fs/reiser4/dformat.h b/fs/reiser4/dformat.h
> index 7943762..7316754 100644
> --- a/fs/reiser4/dformat.h
> +++ b/fs/reiser4/dformat.h
> @@ -14,6 +14,8 @@
>   #if !defined(__FS_REISER4_DFORMAT_H__)
>   #define __FS_REISER4_DFORMAT_H__
>   
> +#include "debug.h"
> +
>   #include <asm/byteorder.h>
>   #include <asm/unaligned.h>
>   #include <linux/types.h>
> diff --git a/fs/reiser4/discard.c b/fs/reiser4/discard.c
> new file mode 100644
> index 0000000..9b6a4e3
> --- /dev/null
> +++ b/fs/reiser4/discard.c
> @@ -0,0 +1,434 @@
> +/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
> + * reiser4/README */
> +
> +/* TRIM/discard interoperation subsystem for reiser4. */
> +
> +/*
> + * This subsystem is responsible for populating an atom's ->discard_set and
> + * (later) converting it into a series of discard calls to the kernel.
> + *
> + * The discard is an in-kernel interface for notifying the storage
> + * hardware about blocks that are being logically freed by the filesystem.
> + * This is done via calling the blkdev_issue_discard() function. There are
> + * restrictions on block ranges: they should constitute at least one erase unit
> + * in length and be correspondingly aligned. Otherwise a discard request will
> + * be ignored.
> + *
> + * The erase unit size is kept in struct queue_limits as discard_granularity.
> + * The offset from the partition start to the first erase unit is kept in struct
> + * queue_limits as discard_alignment.
> + *
> + * At atom level, we log all blocks that happen to be deallocated at least once.
> + * Then we have to read the log, filter out any blocks that have since been
> + * allocated again and issue discards for everything still valid. This is what
> + * discard.[ch] is here for.
> + *
> + * The log is ->discard_set of struct txn_atom. Simply iterating through the
> + * logged block ranges is not enough:
> + * - if a single logged range is smaller than the erase unit, then this
> + *   particular range won't be discarded even if it is surrounded by enough
> + *   free blocks to constitute a whole erase unit;
> + * - we won't be able to merge small adjacent ranges forming a range long enough
> + *   to be discarded.
> + *
> + * MECHANISM:
> + *
> + * During the transaction deallocated extents are logged as-is to a data
> + * structure (let's call it "the discard set"). On atom commit we will generate
> + * a minimal superset of the discard set, but comprised of whole erase units.
> + *
> + * For now the discard set is a linked list.
> + *
> + * So, at commit time the following actions take place:
> + * - elements of the discard set are sorted;
> + * - the discard set is iterated, merging any adjacent extents;
> + * - each resulting extents is "covered" by erase units:
> + *   - its start is rounded down to the closest erase unit boundary;
> + *   - starting from this block, extents of erase unit length are created
> + *     until the original is fully covered
> + * - the calculated erase units are checked to be fully deallocated
> + * - remaining (valid) erase units are then passed to blkdev_issue_discard()
> + */
> +
> +#include "discard.h"
> +#include "context.h"
> +#include "debug.h"
> +#include "txnmgr.h"
> +#include "super.h"
> +
> +#include <linux/slab.h>
> +#include <linux/list_sort.h>
> +#include <linux/fs.h>
> +#include <linux/blkdev.h>
> +
> +/**
> + * Represents an extent range [@start; @end).
> + */
> +struct discard_set_entry {
> +	reiser4_block_nr start, end;
> +	struct list_head link;
> +};
> +
> +#define discard_list_entry(ptr) list_entry(ptr, discard_set_entry, link)
> +
> +static void discard_set_entry_init(discard_set_entry *entry)
> +{
> +	entry->start = 0;
> +	entry->end = 0;
> +	INIT_LIST_HEAD(&entry->link);
> +}
> +
> +static discard_set_entry *discard_set_entry_alloc(void)
> +{
> +	discard_set_entry *entry;
> +
> +	entry = (discard_set_entry *) kmalloc(sizeof(discard_set_entry),
> +	                                      reiser4_ctx_gfp_mask_get());
> +	if (entry == NULL) {
> +		return NULL;
> +	}
> +
> +	discard_set_entry_init(entry);
> +
> +	return entry;
> +}
> +
> +static void discard_set_entry_free(discard_set_entry *entry)
> +{
> +	kfree(entry);
> +}
> +
> +/**
> + * Given ranges @to and [@start; @end), if they overlap, their union
> + * is calculated and saved in @to.
> + */
> +static int discard_set_entry_merge(discard_set_entry *to,
> +                                   reiser4_block_nr start,
> +                                   reiser4_block_nr end)
> +{
> +	if (to->start <= end && start <= to->end) {
> +		reiser4_log ("discard", "Merging extents: [%llu; %llu) and [%llu; %llu)",
> +		             to->start, to->end, start, end);
> +
> +		if (start < to->start) {
> +			to->start = start;
> +		}
> +
> +		if (end > to->end) {
> +			to->end = end;
> +		}
> +
> +		return 0;
> +	}
> +
> +	return -1;
> +}
> +
> +static int discard_set_entry_merge_entry(discard_set_entry *to,
> +                                         discard_set_entry *from)
> +{
> +	return discard_set_entry_merge(to, from->start, from->end);
> +}
> +
> +/**
> + * A comparison function for list_sort().
> + *
> + * "The comparison function @cmp must return a negative value if @a
> + * should sort before @b, and a positive value if @a should sort after
> + * @b. If @a and @b are equivalent, and their original relative
> + * ordering is to be preserved, @cmp must return 0."
> + */
> +static int discard_set_entry_compare(void* priv, struct list_head *a,
> +                                     struct list_head *b)
> +{
> +	discard_set_entry *entry_a, *entry_b;
> +
> +	entry_a = discard_list_entry(a);
> +	entry_b = discard_list_entry(b);
> +
> +	/* First sort by starting block numbers... */
> +	if (entry_a->start < entry_b->start) {
> +		return -1;
> +	}
> +
> +	if (entry_a->start > entry_b->start) {
> +		return 1;
> +	}
> +
> +	/** Then by ending block numbers.
> +	 * If @a contains @b, it will be sorted before. */
> +	if (entry_a->end > entry_b->end) {
> +		return -1;
> +	}
> +
> +	if (entry_a->end < entry_b->end) {
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int __discard_extent(struct block_device *bdev, sector_t start, sector_t len)
> +{
> +	reiser4_log("discard", "DISCARDING: [%llu; %llu)",
> +	            (unsigned long long)start,
> +	            (unsigned long long)(start + len));
> +
> +	return blkdev_issue_discard(bdev, start, len, reiser4_ctx_gfp_mask_get(), 0);
> +}
> +
> +static int discard_extent(discard_set_entry *e)
> +{
> +	struct super_block *sb = reiser4_get_current_sb();
> +	struct block_device *bdev = sb->s_bdev;
> +	struct queue_limits *limits = &bdev_get_queue(bdev)->limits;
> +
> +	sector_t extent_start_sec, extent_end_sec,
> +	         unit_sec, request_start_sec = 0, request_len_sec = 0;
> +	reiser4_block_nr unit_start_blk, unit_len_blk;
> +	int ret, erase_unit_counter = 0;
> +
> +	const int sec_per_blk = sb->s_blocksize >> 9;
> +
> +	/* from blkdev_issue_discard():
> +	 * Zero-sector (unknown) and one-sector granularities are the same.  */
> +	const int granularity = max(limits->discard_granularity >> 9, 1U);
> +	const int alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
> +
> +	/* we assume block = N * sector */
> +	assert("intelfx-7", sec_per_blk > 0);
> +
> +	reiser4_log ("discard", "Extent {blk}: [%llu; %llu)",
> +	             (unsigned long long)e->start,
> +	             (unsigned long long)e->end);
> +
> +	/* convert extent to sectors */
> +	extent_start_sec = e->start * sec_per_blk;
> +	extent_end_sec = e->end * sec_per_blk;
> +
> +	reiser4_log ("discard", "Extent {sec}: [%llu; %llu)",
> +	             (unsigned long long)extent_start_sec,
> +	             (unsigned long long)extent_end_sec);
> +
> +	/* round down extent start sector to an erase unit boundary */
> +	unit_sec = extent_start_sec;
> +	if (granularity > 1) {
> +		unit_sec -= ((extent_start_sec - alignment) % granularity);
> +	}
> +
> +	/* iterate over erase units in the extent */
> +	do {
> +		/* considering erase unit:
> +		 * [unit_sec; unit_sec + granularity) */
> +
> +		reiser4_log ("discard", "Erase unit %d {sec}: [%llu; %llu)",
> +		             erase_unit_counter,
> +		             (unsigned long long)unit_sec,
> +		             (unsigned long long)(unit_sec + granularity));
> +
> +		/* calculate block range for erase unit:
> +		 * [unit_start_blk; unit_start_blk+unit_len_blk) */
> +		unit_start_blk = unit_sec;
> +		sector_div(unit_start_blk, sec_per_blk);
> +
> +		if (granularity > 1) {
> +			unit_len_blk = unit_sec + granularity - 1;
> +			sector_div(unit_len_blk, sec_per_blk);
> +			unit_len_blk += 1 - unit_start_blk;
> +		} else {
> +			unit_len_blk = 1;
> +		}
> +
> +		reiser4_log("discard", "Erase unit %d {blk}: [%llu; %llu)",
> +		            erase_unit_counter,
> +		            (unsigned long long)unit_start_blk,
> +		            (unsigned long long)(unit_start_blk + unit_len_blk));
> +
> +		if (reiser4_check_blocks(&unit_start_blk, &unit_len_blk, 0)) {
> +			/* OK. Add this unit to the accumulator.
> +			 * We accumulate discard units to call blkdev_issue_discard()
> +			 * not too frequently. */
> +
> +			reiser4_log("discard", "Erase unit %d: OK, adding to request",
> +			            erase_unit_counter);
> +
> +			if (request_len_sec > 0) {
> +				request_len_sec += granularity;
> +			} else {
> +				request_start_sec = unit_sec;
> +				request_len_sec = granularity;
> +			}
> +
> +			reiser4_log("discard", "Erase unit %d: request updated: [%llu; %llu)",
> +			            erase_unit_counter,
> +			            (unsigned long long)request_start_sec,
> +			            (unsigned long long)(request_start_sec + request_len_sec));
> +		} else {
> +			/* This unit can't be discarded. Discard what's been accumulated
> +			 * so far. */
> +			if (request_len_sec > 0) {
> +				if ((ret = __discard_extent(bdev, request_start_sec,
> +				                            request_len_sec))) {
> +					return ret;
> +				}
> +				request_len_sec = 0;
> +			}
> +		}
> +
> +		unit_sec += granularity;
> +		++erase_unit_counter;
> +	} while (unit_sec < extent_end_sec);
> +
> +	/* Discard the last accumulated request. */
> +	if (request_len_sec > 0) {
> +		if ((ret = __discard_extent(bdev, request_start_sec,
> +		                            request_len_sec))) {
> +			return ret;
> +		}
> +	}
> +
> +	reiser4_log ("discard", "Extent done");
> +
> +	return 0;
> +}
> +
> +static int __discard_list(struct list_head *list)
> +{
> +	struct list_head *pos;
> +	struct discard_set_entry *entry;
> +	int ret;
> +	int extent_count = 0;
> +
> +	list_for_each(pos, list) {
> +		entry = discard_list_entry(pos);
> +
> +		if ((ret = discard_extent(entry))) {
> +			return ret;
> +		}
> +
> +		++extent_count;
> +	}
> +
> +	reiser4_log ("discard", "Discarding: %d extents",
> +	             extent_count);
> +
> +	return 0;
> +}
> +
> +void discard_set_init(txn_atom *atom)
> +{
> +	INIT_LIST_HEAD(&atom->discard_set);
> +}
> +
> +void discard_set_destroy(txn_atom *atom)
> +{
> +	struct list_head *pos, *tmp;
> +	discard_set_entry *entry;
> +
> +	list_for_each_safe(pos, tmp, &atom->discard_set) {
> +		entry = discard_list_entry(pos);
> +		list_del_init(pos);
> +		discard_set_entry_free(entry);
> +	}
> +}
> +
> +void discard_set_merge(txn_atom *from, txn_atom *to)
> +{
> +	list_splice_tail_init(&from->discard_set, &to->discard_set);
> +}
> +
> +int discard_atom(txn_atom *atom)
> +{
> +	int ret;
> +	struct list_head discard_set, *pos, *next;
> +	struct discard_set_entry *entry, *next_entry;
> +
> +	if (list_empty(&atom->discard_set)) {
> +		spin_unlock_atom(atom);
> +		return 0;
> +	}
> +
> +	/* Step 0. Take the discard set from the atom in order to release
> +	 * atom spinlock. */
> +	discard_set = atom->discard_set;
> +	INIT_LIST_HEAD(&atom->discard_set);
> +	spin_unlock_atom(atom);
> +
> +	/* Step 1. Sort the extent list. */
> +	list_sort (NULL, &discard_set, discard_set_entry_compare);
> +
> +	/* Step 2. Join adjacent extents in the list. */
> +	pos = discard_set.next;
> +	next = pos->next;
> +	entry = discard_list_entry(pos);
> +
> +	for (;
> +	     next != &discard_set;
> +	     pos = next, next = pos->next, entry = next_entry) {
> +		next_entry = discard_list_entry(next);
> +
> +		while (!discard_set_entry_merge_entry (entry, next_entry)) {
> +			list_del_init(next);
> +			discard_set_entry_free(next_entry);
> +
> +			next = pos->next;
> +			next_entry = discard_list_entry(next);
> +		}
> +	}
> +
> +	/* Step 3. Perform actual dirty work. */
> +	if ((ret = __discard_list(&discard_set))) {
> +		return ret;
> +	}
> +
> +	/* Let's do this again for any new extents in the atom's discard set. */
> +	return -E_REPEAT;
> +}
> +
> +extern int discard_set_add_extent(txn_atom *atom,
> +                                  discard_set_entry **new_entry,
> +                                  const reiser4_block_nr *start,
> +                                  const reiser4_block_nr *len)
> +{
> +	if (*new_entry == NULL) {
> +		/*
> +		 * Optimization: try to merge new extent into the last one.
> +		 */
> +		if (!list_empty(&atom->discard_set)) {
> +			discard_set_entry *last_entry;
> +			last_entry = discard_list_entry(atom->discard_set.prev);
> +			if (!discard_set_entry_merge(last_entry, *start, *start + *len)) {
> +				return 0;
> +			}
> +		}
> +
> +		/*
> +		 * Otherwise, allocate a new entry and tell -E_REPEAT.
> +		 * Next time we'll take the branch below.
> +		 */
> +		spin_unlock_atom(atom);
> +		*new_entry = discard_set_entry_alloc();
> +		return (*new_entry != NULL) ? -E_REPEAT : RETERR(-ENOMEM);
> +	}
> +
> +	/*
> +	 * The entry has been allocated beforehand, fill it and link to the list.
> +	 */
> +	(*new_entry)->start = *start;
> +	(*new_entry)->end = *start + *len;
> +	list_add_tail(&(*new_entry)->link, &atom->discard_set);
> +
> +	return 0;
> +}
> +
> +
> +/* Make Linus happy.
> +   Local variables:
> +   c-indentation-style: "K&R"
> +   mode-name: "LC"
> +   c-basic-offset: 8
> +   tab-width: 8
> +   fill-column: 120
> +   scroll-step: 1
> +   End:
> +*/
> diff --git a/fs/reiser4/discard.h b/fs/reiser4/discard.h
> new file mode 100644
> index 0000000..dc3d43c
> --- /dev/null
> +++ b/fs/reiser4/discard.h
> @@ -0,0 +1,38 @@
> +/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
> + * reiser4/README */
> +
> +/* TRIM/discard interoperation subsystem for reiser4. */
> +
> +#if !defined(__FS_REISER4_DISCARD_H__)
> +#define __FS_REISER4_DISCARD_H__
> +
> +#include "forward.h"
> +#include "dformat.h"
> +
> +extern int discard_atom (txn_atom *atom);
> +
> +/**
> + * Adds a "candidate" extent to the "discard set" of the @atom.
> + * The @atom should be locked.
> + */
> +extern int discard_set_add_extent(txn_atom *atom,
> +                                  discard_set_entry **new_entry,
> +                                  const reiser4_block_nr *start,
> +                                  const reiser4_block_nr *len);
> +
> +extern void discard_set_init(txn_atom *atom);
> +extern void discard_set_destroy(txn_atom *atom);
> +extern void discard_set_merge(txn_atom *from, txn_atom *to);
> +
> +/* __FS_REISER4_DSCALE_H__ */
> +#endif
> +
> +/* Make Linus happy.
> +   Local variables:
> +   c-indentation-style: "K&R"
> +   mode-name: "LC"
> +   c-basic-offset: 8
> +   tab-width: 8
> +   fill-column: 120
> +   End:
> +*/
> diff --git a/fs/reiser4/forward.h b/fs/reiser4/forward.h
> index 15dbfdc..a83db08 100644
> --- a/fs/reiser4/forward.h
> +++ b/fs/reiser4/forward.h
> @@ -38,6 +38,7 @@ typedef struct reiser4_dir_entry_desc reiser4_dir_entry_desc;
>   typedef struct reiser4_context reiser4_context;
>   typedef struct carry_level carry_level;
>   typedef struct blocknr_set_entry blocknr_set_entry;
> +typedef struct discard_set_entry discard_set_entry;
>   /* super_block->s_fs_info points to this */
>   typedef struct reiser4_super_info_data reiser4_super_info_data;
>   /* next two objects are fields of reiser4_super_info_data */
> diff --git a/fs/reiser4/plugin/space/bitmap.c b/fs/reiser4/plugin/space/bitmap.c
> index 1d0fabf..5bfa71b 100644
> --- a/fs/reiser4/plugin/space/bitmap.c
> +++ b/fs/reiser4/plugin/space/bitmap.c
> @@ -1222,29 +1222,13 @@ void reiser4_dealloc_blocks_bitmap(reiser4_space_allocator * allocator,
>   	release_and_unlock_bnode(bnode);
>   }
>   
> -/* plugin->u.space_allocator.check_blocks(). */
> -void reiser4_check_blocks_bitmap(const reiser4_block_nr * start,
> -				 const reiser4_block_nr * len, int desired)
> +static int check_blocks_one_bitmap(bmap_nr_t bmap, bmap_off_t start_offset,
> +                                    bmap_off_t end_offset, int desired)
>   {
> -#if REISER4_DEBUG
>   	struct super_block *super = reiser4_get_current_sb();
> -
> -	bmap_nr_t bmap;
> -	bmap_off_t start_offset;
> -	bmap_off_t end_offset;
> -
> -	struct bitmap_node *bnode;
> +	struct bitmap_node *bnode = get_bnode(super, bmap);
>   	int ret;
>   
> -	assert("zam-622", len != NULL);
> -	check_block_range(start, len);
> -	parse_blocknr(start, &bmap, &start_offset);
> -
> -	end_offset = start_offset + *len;
> -	assert("nikita-2214", end_offset <= bmap_bit_count(super->s_blocksize));
> -
> -	bnode = get_bnode(super, bmap);
> -
>   	assert("nikita-2215", bnode != NULL);
>   
>   	ret = load_and_lock_bnode(bnode);
> @@ -1253,19 +1237,60 @@ void reiser4_check_blocks_bitmap(const reiser4_block_nr * start,
>   	assert("nikita-2216", jnode_is_loaded(bnode->wjnode));
>   
>   	if (desired) {
> -		assert("zam-623",
> -		       reiser4_find_next_zero_bit(bnode_working_data(bnode),
> +		ret = reiser4_find_next_zero_bit(bnode_working_data(bnode),
>   						  end_offset, start_offset)
> -		       >= end_offset);
> +		      >= end_offset;
>   	} else {
> -		assert("zam-624",
> -		       reiser4_find_next_set_bit(bnode_working_data(bnode),
> +		ret = reiser4_find_next_set_bit(bnode_working_data(bnode),
>   						 end_offset, start_offset)
> -		       >= end_offset);
> +		      >= end_offset;
>   	}
>   
>   	release_and_unlock_bnode(bnode);
> -#endif
> +
> +	return ret;
> +}
> +
> +/* plugin->u.space_allocator.check_blocks(). */
> +int reiser4_check_blocks_bitmap(const reiser4_block_nr * start,
> +				 const reiser4_block_nr * len, int desired)
> +{
> +	struct super_block *super = reiser4_get_current_sb();
> +
> +	reiser4_block_nr end;
> +	bmap_nr_t bmap, end_bmap;
> +	bmap_off_t offset;
> +	bmap_off_t end_offset;
> +	const bmap_off_t max_offset = bmap_bit_count(super->s_blocksize);
> +
> +	if (len != NULL) {
> +		check_block_range(start, len);
> +		end = *start + *len - 1;
> +	} else {
> +		/* end is used as temporary len here */
> +		check_block_range(start, &(end = 1));
> +		end = *start;
> +	}
> +
> +	parse_blocknr(start, &bmap, &offset);
> +
> +	if (end == *start) {
> +		end_bmap = bmap;
> +		end_offset = offset;
> +	} else {
> +		parse_blocknr(&end, &end_bmap, &end_offset);
> +	}
> +	++end_offset;
> +
> +	assert("intelfx-4", end_bmap >= bmap);
> +	assert("intelfx-5", ergo(end_bmap == bmap, end_offset > offset));
> +
> +	for (; bmap < end_bmap; bmap++, offset = 0) {
> +		if (!check_blocks_one_bitmap(bmap, offset, max_offset, desired)) {
> +			return 0;
> +		}
> +	}
> +	return check_blocks_one_bitmap(bmap, offset, end_offset, desired);
>   }
>   
>   /* conditional insertion of @node into atom's overwrite set  if it was not there */
> diff --git a/fs/reiser4/plugin/space/bitmap.h b/fs/reiser4/plugin/space/bitmap.h
> index be867f1..4590498 100644
> --- a/fs/reiser4/plugin/space/bitmap.h
> +++ b/fs/reiser4/plugin/space/bitmap.h
> @@ -19,7 +19,7 @@ extern int reiser4_alloc_blocks_bitmap(reiser4_space_allocator *,
>   				       reiser4_blocknr_hint *, int needed,
>   				       reiser4_block_nr * start,
>   				       reiser4_block_nr * len);
> -extern void reiser4_check_blocks_bitmap(const reiser4_block_nr *,
> +extern int reiser4_check_blocks_bitmap(const reiser4_block_nr *,
>   					const reiser4_block_nr *, int);
>   extern void reiser4_dealloc_blocks_bitmap(reiser4_space_allocator *,
>   					  reiser4_block_nr,
> diff --git a/fs/reiser4/plugin/space/space_allocator.h b/fs/reiser4/plugin/space/space_allocator.h
> index 5bfa9a3..71bfd11 100644
> --- a/fs/reiser4/plugin/space/space_allocator.h
> +++ b/fs/reiser4/plugin/space/space_allocator.h
> @@ -29,9 +29,9 @@ static inline void sa_dealloc_blocks (reiser4_space_allocator * al, reiser4_bloc
>   	reiser4_dealloc_blocks_##allocator (al, start, len);								\
>   }															\
>   															\
> -static inline void sa_check_blocks (const reiser4_block_nr * start, const reiser4_block_nr * end, int desired) 		\
> +static inline int sa_check_blocks (const reiser4_block_nr * start, const reiser4_block_nr * end, int desired) 		\
>   {															\
> -	reiser4_check_blocks_##allocator (start, end, desired);							        \
> +	return reiser4_check_blocks_##allocator (start, end, desired);							        \
>   }															\
>   															\
>   static inline void sa_pre_commit_hook (void)										\
> diff --git a/fs/reiser4/txnmgr.c b/fs/reiser4/txnmgr.c
> index 8b305a4..fe7ea04 100644
> --- a/fs/reiser4/txnmgr.c
> +++ b/fs/reiser4/txnmgr.c
> @@ -233,6 +233,7 @@ year old --- define all technical terms used.
>   #include "vfs_ops.h"
>   #include "inode.h"
>   #include "flush.h"
> +#include "discard.h"
>   
>   #include <asm/atomic.h>
>   #include <linux/types.h>
> @@ -407,6 +408,8 @@ static void atom_init(txn_atom * atom)
>   	blocknr_set_init(&atom->delete_set);
>   	blocknr_set_init(&atom->wandered_map);
>   
> +	discard_set_init(atom);
> +
>   	init_atom_fq_parts(atom);
>   }
>   
> @@ -801,6 +804,8 @@ static void atom_free(txn_atom * atom)
>   	blocknr_set_destroy(&atom->delete_set);
>   	blocknr_set_destroy(&atom->wandered_map);
>   
> +	discard_set_destroy(atom);
> +
>   	assert("jmacd-16", atom_isclean(atom));
>   
>   	spin_unlock_atom(atom);
> @@ -1086,6 +1091,19 @@ static int commit_current_atom(long *nr_submitted, txn_atom ** atom)
>   	if (ret < 0)
>   		reiser4_panic("zam-597", "write log failed (%ld)\n", ret);
>   
> +	/* XXX: perform discards. Can this be moved out of tmgr mutex?
> +	   whether this needs to be placed before/after reiser4_invalidate_list() calls? */
> +
> +	do {
> +		spin_lock_atom(*atom);
> +		ret = discard_atom(*atom);
> +	} while (ret == -E_REPEAT);
> +
> +	if (ret) {
> +		warning("intelfx-8", "discard atom failed (%ld)", ret);
> +		ret = 0; /* the discard is optional, don't fail the commit */
> +	}
> +
>   	/* The atom->ovrwr_nodes list is processed under commit mutex held
>   	   because of bitmap nodes which are captured by special way in
>   	   reiser4_pre_commit_hook_bitmap(), that way does not include
> @@ -3021,6 +3039,9 @@ static void capture_fuse_into(txn_atom * small, txn_atom * large)
>   	blocknr_set_merge(&small->delete_set, &large->delete_set);
>   	blocknr_set_merge(&small->wandered_map, &large->wandered_map);
>   
> +	/* Merge discard candidate sets. */
> +	discard_set_merge(small, large);
> +
>   	/* Merge allocated/deleted file counts */
>   	large->nr_objects_deleted += small->nr_objects_deleted;
>   	large->nr_objects_created += small->nr_objects_created;
> diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
> index 034a3fe..86fbcac 100644
> --- a/fs/reiser4/txnmgr.h
> +++ b/fs/reiser4/txnmgr.h
> @@ -252,6 +252,13 @@ struct txn_atom {
>   	/* The atom's wandered_block mapping. */
>   	struct list_head wandered_map;
>   
> +	/* XXX: The atom's discard candidate set. It collects block numbers of all inodes
> +	   which were at least once deallocated during the transaction.
> +	   At commit time, this list is traversed to sort extents, merge them, calculate
> +	   the minimal set of erase units encompassing given extents. Then actually free
> +	   (deallocated) erase units are submitted to discard. */
> +	struct list_head discard_set;
> +
>   	/* The transaction's list of dirty captured nodes--per level.  Index
>   	   by (level). dirty_nodes[0] is for znode-above-root */
>   	struct list_head dirty_nodes[REAL_MAX_ZTREE_HEIGHT + 1];
> diff --git a/fs/reiser4/znode.c b/fs/reiser4/znode.c
> index 4ff9714..08eab3d 100644
> --- a/fs/reiser4/znode.c
> +++ b/fs/reiser4/znode.c
> @@ -534,10 +534,11 @@ znode *zget(reiser4_tree * tree,
>   
>   		write_unlock_tree(tree);
>   	}
> -#if REISER4_DEBUG
> -	if (!reiser4_blocknr_is_fake(blocknr) && *blocknr != 0)
> -		reiser4_check_block(blocknr, 1);
> -#endif
> +
> +	assert("intelfx-6",
> +	       ergo(!reiser4_blocknr_is_fake(blocknr) && *blocknr != 0,
> +	            reiser4_check_block(blocknr, 1)));
> +
>   	/* Check for invalid tree level, return -EIO */
>   	if (unlikely(znode_get_level(result) != level)) {
>   		warning("jmacd-504",


      reply	other threads:[~2014-05-04 22:03 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-04 18:11 [RFC] [PATCH] reiser4: discard support: initial implementation using extent lists Ivan Shapovalov
2014-05-04 22:03 ` Edward Shishkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5366B940.50304@gmail.com \
    --to=edward.shishkin@gmail.com \
    --cc=intelfx100@gmail.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).