* [PATCH 0/6] nilfs2: implement tracking of live blocks
@ 2014-03-16 10:47 Andreas Rohner
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:47 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
Hi,
This patch set implements the tracking of live blocks in segments. This
information is crucial in implementing better GC policies, because
now the policies can make informed decisions about which segments have
the biggest number of reclaimable blocks.
The difficulty in tracking live blocks is the fact, that any block can
belong to any number of snapshots and snapshots can be deleted and
created at any time. A block belongs to a snapshot, if the checkpoint
number lies between de_start and de_end of the block. So if a new
snapshot is created, all the reclaimable blocks belonging to it are no
longer reclaimable and therefore the live block counter of the
corresponding segment must be incremented. Conversely if a snapshot is
removed, all the reclaimable blocks belonging to it should really be
counted as reclaimable again and the counter must be decremented. But if
one block belongs to two or more snapshots the counter must only be
incremented once for the first and decremented once for the last
snapshot.
To achieve this I used the de_rsv field of nilfs_dat_entry to store one
of the snapshot numbers. Every time a snapshot is created/removed the
whole DAT-File is scanned and de_rsv is updated if the snapshot number
is between de_start and de_end. But one block can belong to an
arbitrary number of snapshots. Here I use the fact, that the
snapshot list is organized as a sorted linked list. So by knowing the
previous and the next snapshot number it is possible to
reliably determine, if a block is reclaimable or belongs to another
snapshot.
It is of course unacceptable to update the whole DAT-File to create one
snapshot. So only reclaimable blocks are updated. But this leads to
certain situations, where the counters won't be accurate. The userspace
GC should be capable of compensating and correcting the inaccurate
values.
Another problem is the protection period in the userspace GC. The kernel
doesn't know anything about the userspace protection period, and it is
therefore not reflected in the number of live blocks in a segment. For
example if the GC policy chooses a segment that seems to have a lot of
reclaimable blocks, it could turn out, that all of those blocks are
still protected by the protection period.
To overcome this problem I added an additional field to su_lastdec to
the segment usage information. Whenever the number of live blocks in a
segment is adjusted su_lastdec is set to the current timestamp. If the
number of live blocks was adjusted within the protection period, then
the userspace GC policy can recognize it and choose a different segment.
Compatibility Issues:
1. su_nblocks is reused to represent the number of live blocks
old nilfs-utils would break the file system.
2. the vd_pad field of nilfs_vdesc was not initialized to 0
so old nilfs-utils could send arbitrary flags to the kernel
Benchmark Results:
The benchmark replays NFS-Traces to simulate a real file system load.
The file system is filled up to 20% capacity and then the NFS-Traces are
replayed. In parallel every 5 minutes random checkpoints are turned into
snapshots. After 15 minutes the snapshot is turned back into a
checkpoint.
Greedy-Policy-Runtime: 6221.712s
Cost-Benefit-Policy-Runtime: 6874.840s
Timestamp-Policy-Runtime: 13179.626s
Best regards,
Andreas Rohner
---
Andreas Rohner (6):
nilfs2: add helper function to go through all entries of meta data
file
nilfs2: add new timestamp to seg usage and function to change
su_nblocks
nilfs2: scan dat entries at snapshot creation/deletion time
nilfs2: add ioctl() to clean snapshot flags from dat entries
nilfs2: add counting of live blocks for blocks that are overwritten
nilfs2: add counting of live blocks for deleted files
fs/nilfs2/alloc.c | 121 +++++++++++++++++++++++++
fs/nilfs2/alloc.h | 6 ++
fs/nilfs2/bmap.c | 8 +-
fs/nilfs2/bmap.h | 2 +-
fs/nilfs2/btree.c | 3 +-
fs/nilfs2/cpfile.c | 7 ++
fs/nilfs2/dat.c | 225 +++++++++++++++++++++++++++++++++++++++++++++-
fs/nilfs2/dat.h | 32 ++++++-
fs/nilfs2/direct.c | 3 +-
fs/nilfs2/inode.c | 2 +
fs/nilfs2/ioctl.c | 109 +++++++++++++++++++++-
fs/nilfs2/mdt.c | 5 +-
fs/nilfs2/page.h | 6 +-
fs/nilfs2/segbuf.c | 25 ++++++
fs/nilfs2/segbuf.h | 4 +
fs/nilfs2/segment.c | 69 ++++++++++++--
fs/nilfs2/sufile.c | 86 +++++++++++++++++-
fs/nilfs2/sufile.h | 18 ++++
include/linux/nilfs2_fs.h | 65 +++++++++++++-
19 files changed, 772 insertions(+), 24 deletions(-)
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 1/6] nilfs2: add helper function to go through all entries of meta data file
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-16 10:47 ` Andreas Rohner
[not found] ` <2adbf1034ab4b129223553746577f6ec0e699869.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks Andreas Rohner
` (6 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:47 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
This patch introduces the nilfs_palloc_scan_entries() function,
which takes an inode of one of nilfs' meta data files and iterates
through all of its entries. For each entry the callback function
pointer that is given as a parameter is called. The data parameter
is passed to the callback function, so that it may receive
parameters and return results.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
fs/nilfs2/alloc.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/alloc.h | 6 +++
2 files changed, 127 insertions(+)
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 741fd02..0edd85a 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -545,6 +545,127 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
}
/**
+ * nilfs_palloc_scan_entries - scan through every entry and execute dofunc
+ * @inode: inode of metadata file using this allocator
+ * @dofunc: function executed for every entry
+ * @data: data pointer passed to dofunc
+ *
+ * Description: nilfs_palloc_scan_entries() walks through every allocated entry
+ * of a metadata file and executes dofunc on it. It passes a data pointer to
+ * dofunc, which can be used as an input parameter or for returning of results.
+ *
+ * Return Value: On success, 0 is returned. On error, a
+ * negative error code is returned.
+ */
+int nilfs_palloc_scan_entries(struct inode *inode,
+ void (*dofunc)(struct inode *,
+ struct nilfs_palloc_req *,
+ void *),
+ void *data)
+{
+ struct buffer_head *desc_bh, *bitmap_bh;
+ struct nilfs_palloc_group_desc *desc;
+ struct nilfs_palloc_req req;
+ unsigned char *bitmap;
+ void *desc_kaddr, *bitmap_kaddr;
+ unsigned long group, maxgroup, ngroups;
+ unsigned long n, m, entries_per_group, groups_per_desc_block;
+ unsigned long i, j, pos;
+ unsigned long blkoff, prev_blkoff;
+ int ret;
+
+ ngroups = nilfs_palloc_groups_count(inode);
+ maxgroup = ngroups - 1;
+ entries_per_group = nilfs_palloc_entries_per_group(inode);
+ groups_per_desc_block = nilfs_palloc_groups_per_desc_block(inode);
+
+ for (group = 0; group < ngroups;) {
+ ret = nilfs_palloc_get_desc_block(inode, group, 0, &desc_bh);
+ if (ret == -ENOENT)
+ return 0;
+ else if (ret < 0)
+ return ret;
+ req.pr_desc_bh = desc_bh;
+ desc_kaddr = kmap(desc_bh->b_page);
+ desc = nilfs_palloc_block_get_group_desc(inode, group,
+ desc_bh, desc_kaddr);
+ n = nilfs_palloc_rest_groups_in_desc_block(inode, group,
+ maxgroup);
+
+ for (i = 0; i < n; i++, desc++, group++) {
+ m = entries_per_group -
+ nilfs_palloc_group_desc_nfrees(inode,
+ group, desc);
+ if (!m)
+ continue;
+
+ ret = nilfs_palloc_get_bitmap_block(
+ inode, group, 0, &bitmap_bh);
+ if (ret == -ENOENT) {
+ ret = 0;
+ goto out_desc;
+ } else if (ret < 0)
+ goto out_desc;
+
+ req.pr_bitmap_bh = bitmap_bh;
+ bitmap_kaddr = kmap(bitmap_bh->b_page);
+ bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
+ /* entry blkoff is always bigger than 0 */
+ blkoff = 0;
+ pos = 0;
+
+ for (j = 0; j < m; ++j, ++pos) {
+ pos = nilfs_find_next_bit(bitmap,
+ entries_per_group, pos);
+
+ if (pos >= entries_per_group)
+ break;
+
+ /* found an entry */
+ req.pr_entry_nr =
+ entries_per_group * group + pos;
+
+ prev_blkoff = blkoff;
+ blkoff = nilfs_palloc_entry_blkoff(inode,
+ req.pr_entry_nr);
+
+ if (blkoff != prev_blkoff) {
+ if (prev_blkoff)
+ brelse(req.pr_entry_bh);
+
+ ret = nilfs_palloc_get_entry_block(
+ inode, req.pr_entry_nr,
+ 0, &req.pr_entry_bh);
+ if (ret < 0)
+ goto out_entry;
+ }
+
+ dofunc(inode, &req, data);
+ }
+
+ if (blkoff)
+ brelse(req.pr_entry_bh);
+ kunmap(bitmap_bh->b_page);
+ brelse(bitmap_bh);
+ }
+
+ kunmap(desc_bh->b_page);
+ brelse(desc_bh);
+ }
+
+ return 0;
+
+out_entry:
+ kunmap(bitmap_bh->b_page);
+ brelse(bitmap_bh);
+
+out_desc:
+ kunmap(desc_bh->b_page);
+ brelse(desc_bh);
+ return ret;
+}
+
+/**
* nilfs_palloc_commit_alloc_entry - finish allocation of a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index 4bd6451..0592035 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -77,6 +77,7 @@ int nilfs_palloc_freev(struct inode *, __u64 *, size_t);
#define nilfs_set_bit_atomic ext2_set_bit_atomic
#define nilfs_clear_bit_atomic ext2_clear_bit_atomic
#define nilfs_find_next_zero_bit find_next_zero_bit_le
+#define nilfs_find_next_bit find_next_bit_le
/**
* struct nilfs_bh_assoc - block offset and buffer head association
@@ -106,5 +107,10 @@ void nilfs_palloc_setup_cache(struct inode *inode,
struct nilfs_palloc_cache *cache);
void nilfs_palloc_clear_cache(struct inode *inode);
void nilfs_palloc_destroy_cache(struct inode *inode);
+int nilfs_palloc_scan_entries(struct inode *,
+ void (*dofunc)(struct inode *,
+ struct nilfs_palloc_req *,
+ void *),
+ void *);
#endif /* _NILFS_ALLOC_H */
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 1/6] nilfs2: add helper function to go through all entries of meta data file Andreas Rohner
@ 2014-03-16 10:47 ` Andreas Rohner
[not found] ` <12561ce5e2cf8ae07fdda05e16c357f37d17c62f.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 3/6] nilfs2: scan dat entries at snapshot creation/deletion time Andreas Rohner
` (5 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:47 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
This patch adds an additional timestamp to the segment usage
information that indicates the last time the usage information was
changed. So su_lastmod indicates the last time the segment itself was
modified and su_lastdec indicates the last time the usage information
itself was changed.
This is important information for the GC, because it needs to avoid
selecting segments for cleaning that are created (su_lastmod) outside of
the protection period, but the blocks got reclaimable (su_nblocks is
decremented) within the protection period. Without that information the
GC policy has to assume, that there are reclaimble blocks, only to find
out, that they are protected by the protection period.
This patch also introduces nilfs_sufile_add_segment_usage(), which can
be used to increment or decrement the value of su_nblocks of a specific
segment.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
fs/nilfs2/sufile.c | 86 +++++++++++++++++++++++++++++++++++++++++++++--
fs/nilfs2/sufile.h | 18 ++++++++++
include/linux/nilfs2_fs.h | 7 ++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 2a869c3..0886938 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -453,6 +453,8 @@ void nilfs_sufile_do_scrap(struct inode *sufile, __u64 segnum,
su->su_lastmod = cpu_to_le64(0);
su->su_nblocks = cpu_to_le32(0);
su->su_flags = cpu_to_le32(1UL << NILFS_SEGMENT_USAGE_DIRTY);
+ if (nilfs_sufile_lastdec_supported(sufile))
+ su->su_lastdec = cpu_to_le64(0);
kunmap_atomic(kaddr);
nilfs_sufile_mod_counter(header_bh, clean ? (u64)-1 : 0, dirty ? 0 : 1);
@@ -482,7 +484,7 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
WARN_ON(!nilfs_segment_usage_dirty(su));
sudirty = nilfs_segment_usage_dirty(su);
- nilfs_segment_usage_set_clean(su);
+ nilfs_sufile_segment_usage_set_clean(sufile, su);
kunmap_atomic(kaddr);
mark_buffer_dirty(su_bh);
@@ -549,6 +551,75 @@ int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
}
/**
+ * nilfs_sufile_add_segment_usage - decrement usage of a segment
+ * @sufile: inode of segment usage file
+ * @segnum: segment number
+ * @value: value to add to su_nblocks
+ * @dectime: current time
+ *
+ * Description: nilfs_sufile_add_segment_usage() adds a signed value to the
+ * su_nblocks field of the segment usage information of @segnum. It ensures
+ * that the result is bigger than 0 and smaller or equal to the maximum number
+ * of blocks per segment
+ *
+ * Return Value: On success, 0 is returned. On error, one of the following
+ * negative error codes is returned.
+ *
+ * %-ENOMEM - Insufficient memory available.
+ *
+ * %-EIO - I/O error
+ *
+ * %-ENOENT - the specified block does not exist (hole block)
+ */
+int nilfs_sufile_add_segment_usage(struct inode *sufile, __u64 segnum,
+ __s64 value, time_t dectime)
+{
+ struct the_nilfs *nilfs = sufile->i_sb->s_fs_info;
+ struct buffer_head *bh;
+ struct nilfs_segment_usage *su;
+ void *kaddr;
+ int ret;
+
+ if (value == 0)
+ return 0;
+
+ down_write(&NILFS_MDT(sufile)->mi_sem);
+
+ ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, &bh);
+ if (ret < 0)
+ goto out_sem;
+
+ kaddr = kmap_atomic(bh->b_page);
+ su = nilfs_sufile_block_get_segment_usage(sufile, segnum, bh, kaddr);
+ WARN_ON(nilfs_segment_usage_error(su));
+
+ value += le32_to_cpu(su->su_nblocks);
+ if (value < 0)
+ value = 0;
+ if (value > nilfs->ns_blocks_per_segment)
+ value = nilfs->ns_blocks_per_segment;
+
+ if (value == le32_to_cpu(su->su_nblocks)) {
+ kunmap_atomic(kaddr);
+ goto out_brelse;
+ }
+
+ su->su_nblocks = cpu_to_le32(value);
+ if (dectime && nilfs_sufile_lastdec_supported(sufile))
+ su->su_lastdec = cpu_to_le64(dectime);
+ kunmap_atomic(kaddr);
+
+ mark_buffer_dirty(bh);
+ nilfs_mdt_mark_dirty(sufile);
+
+out_brelse:
+ brelse(bh);
+out_sem:
+ up_write(&NILFS_MDT(sufile)->mi_sem);
+ return ret;
+}
+
+/**
* nilfs_sufile_get_stat - get segment usage statistics
* @sufile: inode of segment usage file
* @stat: pointer to a structure of segment usage statistics
@@ -698,7 +769,8 @@ static int nilfs_sufile_truncate_range(struct inode *sufile,
nc = 0;
for (su = su2, j = 0; j < n; j++, su = (void *)su + susz) {
if (nilfs_segment_usage_error(su)) {
- nilfs_segment_usage_set_clean(su);
+ nilfs_sufile_segment_usage_set_clean(sufile,
+ su);
nc++;
}
}
@@ -858,6 +930,13 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf,
if (nilfs_segment_is_active(nilfs, segnum + j))
si->sui_flags |=
(1UL << NILFS_SEGMENT_USAGE_ACTIVE);
+ if (sisz >= sizeof(struct nilfs_suinfo)) {
+ if (susz >= sizeof(struct nilfs_segment_usage))
+ si->sui_lastdec =
+ le64_to_cpu(su->su_lastdec);
+ else
+ si->sui_lastdec = 0;
+ }
}
kunmap_atomic(kaddr);
brelse(su_bh);
@@ -935,6 +1014,9 @@ ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
if (nilfs_suinfo_update_lastmod(sup))
su->su_lastmod = cpu_to_le64(sup->sup_sui.sui_lastmod);
+ if (nilfs_suinfo_update_lastdec(sup))
+ su->su_lastdec = cpu_to_le64(sup->sup_sui.sui_lastdec);
+
if (nilfs_suinfo_update_nblocks(sup))
su->su_nblocks = cpu_to_le32(sup->sup_sui.sui_nblocks);
diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
index b8afd72..e5455d2 100644
--- a/fs/nilfs2/sufile.h
+++ b/fs/nilfs2/sufile.h
@@ -28,6 +28,23 @@
#include <linux/nilfs2_fs.h>
#include "mdt.h"
+static inline int
+nilfs_sufile_lastdec_supported(const struct inode *sufile)
+{
+ return NILFS_MDT(sufile)->mi_entry_size ==
+ sizeof(struct nilfs_segment_usage);
+}
+
+static inline void
+nilfs_sufile_segment_usage_set_clean(const struct inode *sufile,
+ struct nilfs_segment_usage *su)
+{
+ su->su_lastmod = cpu_to_le64(0);
+ su->su_nblocks = cpu_to_le32(0);
+ su->su_flags = cpu_to_le32(0);
+ if (nilfs_sufile_lastdec_supported(sufile))
+ su->su_lastdec = cpu_to_le64(0);
+}
static inline unsigned long nilfs_sufile_get_nsegments(struct inode *sufile)
{
@@ -41,6 +58,7 @@ int nilfs_sufile_alloc(struct inode *, __u64 *);
int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
unsigned long nblocks, time_t modtime);
+int nilfs_sufile_add_segment_usage(struct inode *, __u64, __s64, time_t);
int nilfs_sufile_get_stat(struct inode *, struct nilfs_sustat *);
ssize_t nilfs_sufile_get_suinfo(struct inode *, __u64, void *, unsigned,
size_t);
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index ff3fea3..ca269ad 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -614,11 +614,13 @@ struct nilfs_cpfile_header {
* @su_lastmod: last modified timestamp
* @su_nblocks: number of blocks in segment
* @su_flags: flags
+ * @su_lastdec: last decrement of su_nblocks timestamp
*/
struct nilfs_segment_usage {
__le64 su_lastmod;
__le32 su_nblocks;
__le32 su_flags;
+ __le64 su_lastdec;
};
#define NILFS_MIN_SEGMENT_USAGE_SIZE 16
@@ -663,6 +665,7 @@ nilfs_segment_usage_set_clean(struct nilfs_segment_usage *su)
su->su_lastmod = cpu_to_le64(0);
su->su_nblocks = cpu_to_le32(0);
su->su_flags = cpu_to_le32(0);
+ su->su_lastdec = cpu_to_le64(0);
}
static inline int
@@ -694,11 +697,13 @@ struct nilfs_sufile_header {
* @sui_lastmod: timestamp of last modification
* @sui_nblocks: number of written blocks in segment
* @sui_flags: segment usage flags
+ * @sui_lastdec: last decrement of sui_nblocks timestamp
*/
struct nilfs_suinfo {
__u64 sui_lastmod;
__u32 sui_nblocks;
__u32 sui_flags;
+ __u64 sui_lastdec;
};
#define NILFS_SUINFO_FNS(flag, name) \
@@ -736,6 +741,7 @@ enum {
NILFS_SUINFO_UPDATE_LASTMOD,
NILFS_SUINFO_UPDATE_NBLOCKS,
NILFS_SUINFO_UPDATE_FLAGS,
+ NILFS_SUINFO_UPDATE_LASTDEC,
__NR_NILFS_SUINFO_UPDATE_FIELDS,
};
@@ -759,6 +765,7 @@ nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup) \
NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
+NILFS_SUINFO_UPDATE_FNS(LASTDEC, lastdec)
enum {
NILFS_CHECKPOINT,
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 3/6] nilfs2: scan dat entries at snapshot creation/deletion time
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 1/6] nilfs2: add helper function to go through all entries of meta data file Andreas Rohner
2014-03-16 10:47 ` [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks Andreas Rohner
@ 2014-03-16 10:47 ` Andreas Rohner
[not found] ` <29dee92595249b713fff1e4903d5d76556926eec.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries Andreas Rohner
` (4 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:47 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
To accurately count the number of live blocks in a segment, it is
important to take snapshots into account, because snapshots can protect
reclaimable blocks from being cleaned.
This patch uses the previously reserved de_rsv field of the
nilfs_dat_entry struct to store one of the snapshots the corresponding
block belongs to. One block can belong to many snapshots, but because
the snapshots are stored in a sorted linked list, it is easy to check if
a block belongs to any other snapshot given the previous and the next
snapshot. For example if the current snapshot (in de_ss) is being
removed and neither the previous nor the next snapshot is in the range
of de_start to de_end, then it is guaranteed that the block doesn't
belong to any other snapshot and is reclaimable. On the other hand if
lets say the previous snapshot is in the range of de_start to de_end, we
simply set de_ss to the previous snapshot and the block is not
reclaimable.
To implement this every DAT entry is scanned at snapshot
creation/deletion time and updated if needed. To avoid too many update
operations only potentially reclaimable blocks are ever updated. For
example if there are some deleted files and the checkpoint to which
these files belong is turned into a snapshot, then su_nblocks is
incremented for these blocks, which reverses the decrement that happened
when the files were deleted. If after some time this snapshot is
deleted, su_nblocks is decremented again to reverse the increment at
creation time.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
fs/nilfs2/cpfile.c | 7 ++++
fs/nilfs2/dat.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/dat.h | 26 ++++++++++++++
include/linux/nilfs2_fs.h | 4 +--
4 files changed, 121 insertions(+), 2 deletions(-)
diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
index 0d58075..29952f5 100644
--- a/fs/nilfs2/cpfile.c
+++ b/fs/nilfs2/cpfile.c
@@ -28,6 +28,7 @@
#include <linux/nilfs2_fs.h>
#include "mdt.h"
#include "cpfile.h"
+#include "sufile.h"
static inline unsigned long
@@ -584,6 +585,7 @@ static int nilfs_cpfile_set_snapshot(struct inode *cpfile, __u64 cno)
struct nilfs_cpfile_header *header;
struct nilfs_checkpoint *cp;
struct nilfs_snapshot_list *list;
+ struct the_nilfs *nilfs = cpfile->i_sb->s_fs_info;
__u64 curr, prev;
unsigned long curr_blkoff, prev_blkoff;
void *kaddr;
@@ -681,6 +683,8 @@ static int nilfs_cpfile_set_snapshot(struct inode *cpfile, __u64 cno)
mark_buffer_dirty(header_bh);
nilfs_mdt_mark_dirty(cpfile);
+ nilfs_dat_scan_inc_ss(nilfs->ns_dat, cno);
+
brelse(prev_bh);
out_curr:
@@ -703,6 +707,7 @@ static int nilfs_cpfile_clear_snapshot(struct inode *cpfile, __u64 cno)
struct nilfs_cpfile_header *header;
struct nilfs_checkpoint *cp;
struct nilfs_snapshot_list *list;
+ struct the_nilfs *nilfs = cpfile->i_sb->s_fs_info;
__u64 next, prev;
void *kaddr;
int ret;
@@ -784,6 +789,8 @@ static int nilfs_cpfile_clear_snapshot(struct inode *cpfile, __u64 cno)
mark_buffer_dirty(header_bh);
nilfs_mdt_mark_dirty(cpfile);
+ nilfs_dat_scan_dec_ss(nilfs->ns_dat, cno, prev, next);
+
brelse(prev_bh);
out_next:
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 0d5fada..89a4a5f 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -28,6 +28,7 @@
#include "mdt.h"
#include "alloc.h"
#include "dat.h"
+#include "sufile.h"
#define NILFS_CNO_MIN ((__u64)1)
@@ -97,6 +98,7 @@ void nilfs_dat_commit_alloc(struct inode *dat, struct nilfs_palloc_req *req)
entry->de_start = cpu_to_le64(NILFS_CNO_MIN);
entry->de_end = cpu_to_le64(NILFS_CNO_MAX);
entry->de_blocknr = cpu_to_le64(0);
+ entry->de_ss = cpu_to_le64(0);
kunmap_atomic(kaddr);
nilfs_palloc_commit_alloc_entry(dat, req);
@@ -121,6 +123,7 @@ static void nilfs_dat_commit_free(struct inode *dat,
entry->de_start = cpu_to_le64(NILFS_CNO_MIN);
entry->de_end = cpu_to_le64(NILFS_CNO_MIN);
entry->de_blocknr = cpu_to_le64(0);
+ entry->de_ss = cpu_to_le64(0);
kunmap_atomic(kaddr);
nilfs_dat_commit_entry(dat, req);
@@ -201,6 +204,7 @@ void nilfs_dat_commit_end(struct inode *dat, struct nilfs_palloc_req *req,
WARN_ON(start > end);
}
entry->de_end = cpu_to_le64(end);
+ entry->de_ss = cpu_to_le64(NILFS_CNO_MAX);
blocknr = le64_to_cpu(entry->de_blocknr);
kunmap_atomic(kaddr);
@@ -365,6 +369,8 @@ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr)
}
WARN_ON(blocknr == 0);
entry->de_blocknr = cpu_to_le64(blocknr);
+ if (entry->de_ss == cpu_to_le64(NILFS_CNO_MAX))
+ entry->de_ss = cpu_to_le64(0);
kunmap_atomic(kaddr);
mark_buffer_dirty(entry_bh);
@@ -430,6 +436,86 @@ int nilfs_dat_translate(struct inode *dat, __u64 vblocknr, sector_t *blocknrp)
return ret;
}
+void nilfs_dat_do_scan_dec(struct inode *dat, struct nilfs_palloc_req *req,
+ void *data)
+{
+ struct nilfs_dat_entry *entry;
+ __u64 start, end, prev_ss;
+ __u64 *ssp = data, ss = ssp[0], prev = ssp[1], next = ssp[2];
+ sector_t blocknr;
+ void *kaddr;
+ struct the_nilfs *nilfs;
+
+ kaddr = kmap_atomic(req->pr_entry_bh->b_page);
+ entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
+ req->pr_entry_bh, kaddr);
+ start = le64_to_cpu(entry->de_start);
+ end = le64_to_cpu(entry->de_end);
+ blocknr = le64_to_cpu(entry->de_blocknr);
+ prev_ss = le64_to_cpu(entry->de_ss);
+
+ if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end) {
+ if (prev_ss == ss || prev_ss == NILFS_CNO_MAX) {
+ if (prev && prev >= start && prev < end)
+ entry->de_ss = cpu_to_le64(prev);
+ else if (next && next >= start && next < end)
+ entry->de_ss = cpu_to_le64(next);
+ else
+ entry->de_ss = cpu_to_le64(0);
+
+ if (prev_ss != NILFS_CNO_MAX)
+ prev_ss = le64_to_cpu(entry->de_ss);
+ kunmap_atomic(kaddr);
+ mark_buffer_dirty(req->pr_entry_bh);
+ nilfs_mdt_mark_dirty(dat);
+ } else
+ kunmap_atomic(kaddr);
+
+ if (prev_ss == 0) {
+ nilfs = dat->i_sb->s_fs_info;
+ nilfs_sufile_add_segment_usage(nilfs->ns_sufile,
+ nilfs_get_segnum_of_block(nilfs, blocknr),
+ -1, 0);
+ }
+ } else
+ kunmap_atomic(kaddr);
+}
+
+void nilfs_dat_do_scan_inc(struct inode *dat, struct nilfs_palloc_req *req,
+ void *data)
+{
+ struct nilfs_dat_entry *entry;
+ __u64 start, end, prev_ss;
+ __u64 *ssp = data, ss = *ssp;
+ sector_t blocknr;
+ void *kaddr;
+ struct the_nilfs *nilfs;
+
+ kaddr = kmap_atomic(req->pr_entry_bh->b_page);
+ entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
+ req->pr_entry_bh, kaddr);
+ start = le64_to_cpu(entry->de_start);
+ end = le64_to_cpu(entry->de_end);
+ blocknr = le64_to_cpu(entry->de_blocknr);
+ prev_ss = le64_to_cpu(entry->de_ss);
+
+ if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end &&
+ (prev_ss == 0 || prev_ss == NILFS_CNO_MAX)) {
+
+ entry->de_ss = cpu_to_le64(ss);
+
+ kunmap_atomic(kaddr);
+ mark_buffer_dirty(req->pr_entry_bh);
+ nilfs_mdt_mark_dirty(dat);
+
+ nilfs = dat->i_sb->s_fs_info;
+ nilfs_sufile_add_segment_usage(nilfs->ns_sufile,
+ nilfs_get_segnum_of_block(nilfs, blocknr),
+ 1, 0);
+ } else
+ kunmap_atomic(kaddr);
+}
+
ssize_t nilfs_dat_get_vinfo(struct inode *dat, void *buf, unsigned visz,
size_t nvi)
{
diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
index cbd8e97..92a187e 100644
--- a/fs/nilfs2/dat.h
+++ b/fs/nilfs2/dat.h
@@ -55,5 +55,31 @@ ssize_t nilfs_dat_get_vinfo(struct inode *, void *, unsigned, size_t);
int nilfs_dat_read(struct super_block *sb, size_t entry_size,
struct nilfs_inode *raw_inode, struct inode **inodep);
+void nilfs_dat_do_scan_dec(struct inode *, struct nilfs_palloc_req *, void *);
+void nilfs_dat_do_scan_inc(struct inode *, struct nilfs_palloc_req *, void *);
+
+/**
+ * nilfs_dat_scan_dec_ss - scan all dat entries for a checkpoint dec suinfo
+ * @dat: inode of dat file
+ * @cno: snapshot number
+ * @prev: previous snapshot number
+ * @next: next snapshot number
+ */
+static inline int nilfs_dat_scan_dec_ss(struct inode *dat, __u64 cno,
+ __u64 prev, __u64 next)
+{
+ __u64 data[3] = { cno, prev, next };
+ return nilfs_palloc_scan_entries(dat, nilfs_dat_do_scan_dec, data);
+}
+
+/**
+ * nilfs_dat_scan_dec_ss - scan all dat entries for a checkpoint inc suinfo
+ * @dat: inode of dat file
+ * @cno: snapshot number
+ */
+static inline int nilfs_dat_scan_inc_ss(struct inode *dat, __u64 cno)
+{
+ return nilfs_palloc_scan_entries(dat, nilfs_dat_do_scan_inc, &cno);
+}
#endif /* _NILFS_DAT_H */
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index ca269ad..ba9ebe02 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -475,13 +475,13 @@ struct nilfs_palloc_group_desc {
* @de_blocknr: block number
* @de_start: start checkpoint number
* @de_end: end checkpoint number
- * @de_rsv: reserved for future use
+ * @de_ss: one of the snapshots the block belongs to
*/
struct nilfs_dat_entry {
__le64 de_blocknr;
__le64 de_start;
__le64 de_end;
- __le64 de_rsv;
+ __le64 de_ss;
};
#define NILFS_MIN_DAT_ENTRY_SIZE 32
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
` (2 preceding siblings ...)
2014-03-16 10:47 ` [PATCH 3/6] nilfs2: scan dat entries at snapshot creation/deletion time Andreas Rohner
@ 2014-03-16 10:47 ` Andreas Rohner
[not found] ` <be7d3bd13015117222aac43194c0fdb9c5d0046f.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 5/6] nilfs2: add counting of live blocks for blocks that are overwritten Andreas Rohner
` (3 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:47 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
This patch introduces new flags for nilfs_vdesc to indicate the reason a
block is alive. So if the block would be reclaimable, but must be
treated as if it were alive, because it is part of a snapshot, then the
snapshot flag is set.
Additionally a new ioctl() is added, which enables the userspace GC to
perform a cleanup operation after setting the number of blocks with
NILFS_IOCTL_SET_SUINFO. It sets DAT entries with de_ss values of
NILFS_CNO_MAX to 0. NILFS_CNO_MAX indicates, that the corresponding
block belongs to some snapshot, but was already decremented by a
previous deletion operation. If the segment usage info is changed with
NILFS_IOCTL_SET_SUINFO and the number of blocks is updated, then these
blocks would never be decremented and there are scenarios where the
corresponding segments would starve (never be cleaned). To prevent that
they must be reset to 0.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
fs/nilfs2/dat.c | 63 ++++++++++++++++++++++++++++
fs/nilfs2/dat.h | 1 +
fs/nilfs2/ioctl.c | 103 +++++++++++++++++++++++++++++++++++++++++++++-
include/linux/nilfs2_fs.h | 52 ++++++++++++++++++++++-
4 files changed, 216 insertions(+), 3 deletions(-)
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 89a4a5f..7adb15d 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -382,6 +382,69 @@ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr)
}
/**
+ * nilfs_dat_clean_snapshot_flag - check flags used by snapshots
+ * @dat: DAT file inode
+ * @vblocknr: virtual block number
+ *
+ * Description: nilfs_dat_clean_snapshot_flag() changes the flags from
+ * NILFS_CNO_MAX to 0 if necessary, so that segment usage is accurately
+ * counted. NILFS_CNO_MAX indicates, that the corresponding block belongs
+ * to some snapshot, but was already decremented. If the segment usage info
+ * is changed with NILFS_IOCTL_SET_SUINFO and the number of blocks is updated,
+ * then these blocks would never be decremented and there are scenarios where
+ * the corresponding segments would starve (never be cleaned).
+ *
+ * Return Value: On success, 0 is returned. On error, one of the following
+ * negative error codes is returned.
+ *
+ * %-EIO - I/O error.
+ *
+ * %-ENOMEM - Insufficient amount of memory available.
+ */
+int nilfs_dat_clean_snapshot_flag(struct inode *dat, __u64 vblocknr)
+{
+ struct buffer_head *entry_bh;
+ struct nilfs_dat_entry *entry;
+ void *kaddr;
+ int ret;
+
+ ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * The given disk block number (blocknr) is not yet written to
+ * the device at this point.
+ *
+ * To prevent nilfs_dat_translate() from returning the
+ * uncommitted block number, this makes a copy of the entry
+ * buffer and redirects nilfs_dat_translate() to the copy.
+ */
+ if (!buffer_nilfs_redirected(entry_bh)) {
+ ret = nilfs_mdt_freeze_buffer(dat, entry_bh);
+ if (ret) {
+ brelse(entry_bh);
+ return ret;
+ }
+ }
+
+ kaddr = kmap_atomic(entry_bh->b_page);
+ entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr);
+ if (entry->de_ss == cpu_to_le64(NILFS_CNO_MAX)) {
+ entry->de_ss = cpu_to_le64(0);
+ kunmap_atomic(kaddr);
+ mark_buffer_dirty(entry_bh);
+ nilfs_mdt_mark_dirty(dat);
+ } else {
+ kunmap_atomic(kaddr);
+ }
+
+ brelse(entry_bh);
+
+ return 0;
+}
+
+/**
* nilfs_dat_translate - translate a virtual block number to a block number
* @dat: DAT file inode
* @vblocknr: virtual block number
diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
index 92a187e..a528024 100644
--- a/fs/nilfs2/dat.h
+++ b/fs/nilfs2/dat.h
@@ -51,6 +51,7 @@ void nilfs_dat_abort_update(struct inode *, struct nilfs_palloc_req *,
int nilfs_dat_mark_dirty(struct inode *, __u64);
int nilfs_dat_freev(struct inode *, __u64 *, size_t);
int nilfs_dat_move(struct inode *, __u64, sector_t);
+int nilfs_dat_clean_snapshot_flag(struct inode *, __u64);
ssize_t nilfs_dat_get_vinfo(struct inode *, void *, unsigned, size_t);
int nilfs_dat_read(struct super_block *sb, size_t entry_size,
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 422fb54..0b62bf4 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -578,7 +578,7 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
struct buffer_head *bh;
int ret;
- if (vdesc->vd_flags == 0)
+ if (nilfs_vdesc_data(vdesc))
ret = nilfs_gccache_submit_read_data(
inode, vdesc->vd_offset, vdesc->vd_blocknr,
vdesc->vd_vblocknr, &bh);
@@ -662,6 +662,14 @@ static int nilfs_ioctl_move_blocks(struct super_block *sb,
}
do {
+ /*
+ * old user space tools to not initialize vd_flags2
+ * check if it contains invalid flags
+ */
+ if (vdesc->vd_flags2 &
+ (~0UL << __NR_NILFS_VDESC_FIELDS))
+ vdesc->vd_flags2 = 0;
+
ret = nilfs_ioctl_move_inode_block(inode, vdesc,
&buffers);
if (unlikely(ret < 0)) {
@@ -984,6 +992,96 @@ out:
}
/**
+ * nilfs_ioctl_clean_snapshot_flags - clean dat entries with invalid de_ss
+ * @inode: inode object
+ * @filp: file object
+ * @cmd: ioctl's request code
+ * @argp: pointer on argument from userspace
+ *
+ * Description: nilfs_ioctl_clean_snapshot_flags() sets DAT entries with de_ss
+ * values of NILFS_CNO_MAX to 0. NILFS_CNO_MAX indicates, that the
+ * corresponding block belongs to some snapshot, but was already decremented.
+ * If the segment usage info is changed with NILFS_IOCTL_SET_SUINFO and the
+ * number of blocks is updated, then these blocks would never be decremented
+ * and there are scenarios where the corresponding segments would starve (never
+ * be cleaned).
+ *
+ * Return Value: On success, 0 is returned or error code, otherwise.
+ */
+static int nilfs_ioctl_clean_snapshot_flags(struct inode *inode,
+ struct file *filp,
+ unsigned int cmd,
+ void __user *argp)
+{
+ struct the_nilfs *nilfs = inode->i_sb->s_fs_info;
+ struct nilfs_transaction_info ti;
+ struct nilfs_argv argv;
+ struct nilfs_vdesc *vdesc;
+ size_t len, i;
+ void __user *base;
+ void *kbuf;
+ int ret;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ ret = -EFAULT;
+ if (copy_from_user(&argv, argp, sizeof(struct nilfs_argv)))
+ goto out;
+
+ ret = -EINVAL;
+ if (argv.v_size != sizeof(struct nilfs_vdesc))
+ goto out;
+ if (argv.v_nmembs > UINT_MAX / sizeof(struct nilfs_vdesc))
+ goto out;
+
+ len = argv.v_size * argv.v_nmembs;
+ if (!len) {
+ ret = 0;
+ goto out;
+ }
+
+ base = (void __user *)(unsigned long)argv.v_base;
+ kbuf = vmalloc(len);
+ if (!kbuf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (copy_from_user(kbuf, base, len)) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+
+ ret = nilfs_transaction_begin(inode->i_sb, &ti, 0);
+ if (unlikely(ret))
+ goto out_free;
+
+ for (i = 0, vdesc = kbuf; i < argv.v_nmembs; ++i, ++vdesc) {
+ if (nilfs_vdesc_snapshot(vdesc)) {
+ ret = nilfs_dat_clean_snapshot_flag(nilfs->ns_dat,
+ vdesc->vd_vblocknr);
+ if (ret) {
+ nilfs_transaction_abort(inode->i_sb);
+ goto out_free;
+ }
+ }
+ }
+
+ nilfs_transaction_commit(inode->i_sb);
+
+out_free:
+ vfree(kbuf);
+out:
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
+/**
* nilfs_ioctl_sync - make a checkpoint
* @inode: inode object
* @filp: file object
@@ -1332,6 +1430,8 @@ long nilfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return nilfs_ioctl_get_bdescs(inode, filp, cmd, argp);
case NILFS_IOCTL_CLEAN_SEGMENTS:
return nilfs_ioctl_clean_segments(inode, filp, cmd, argp);
+ case NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS:
+ return nilfs_ioctl_clean_snapshot_flags(inode, filp, cmd, argp);
case NILFS_IOCTL_SYNC:
return nilfs_ioctl_sync(inode, filp, cmd, argp);
case NILFS_IOCTL_RESIZE:
@@ -1368,6 +1468,7 @@ long nilfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
case NILFS_IOCTL_GET_VINFO:
case NILFS_IOCTL_GET_BDESCS:
case NILFS_IOCTL_CLEAN_SEGMENTS:
+ case NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS:
case NILFS_IOCTL_SYNC:
case NILFS_IOCTL_RESIZE:
case NILFS_IOCTL_SET_ALLOC_RANGE:
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index ba9ebe02..30ddc86 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -863,7 +863,7 @@ struct nilfs_vinfo {
* @vd_blocknr: disk block number
* @vd_offset: logical block offset inside a file
* @vd_flags: flags (data or node block)
- * @vd_pad: padding
+ * @vd_flags2: additional flags
*/
struct nilfs_vdesc {
__u64 vd_ino;
@@ -873,9 +873,55 @@ struct nilfs_vdesc {
__u64 vd_blocknr;
__u64 vd_offset;
__u32 vd_flags;
- __u32 vd_pad;
+ /* vd_flags2 needed because of backwards compatibility */
+ __u32 vd_flags2;
};
+/* vdesc flags */
+enum {
+ NILFS_VDESC_DATA,
+ NILFS_VDESC_NODE,
+ /* ... */
+};
+enum {
+ NILFS_VDESC_SNAPSHOT,
+ __NR_NILFS_VDESC_FIELDS,
+ /* ... */
+};
+
+#define NILFS_VDESC_FNS(flag, name) \
+static inline void \
+nilfs_vdesc_set_##name(struct nilfs_vdesc *vdesc) \
+{ \
+ vdesc->vd_flags = NILFS_VDESC_##flag; \
+} \
+static inline int \
+nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
+{ \
+ return vdesc->vd_flags == NILFS_VDESC_##flag; \
+}
+
+#define NILFS_VDESC_FNS2(flag, name) \
+static inline void \
+nilfs_vdesc_set_##name(struct nilfs_vdesc *vdesc) \
+{ \
+ vdesc->vd_flags2 |= (1UL << NILFS_VDESC_##flag); \
+} \
+static inline void \
+nilfs_vdesc_clear_##name(struct nilfs_vdesc *vdesc) \
+{ \
+ vdesc->vd_flags2 &= ~(1UL << NILFS_VDESC_##flag); \
+} \
+static inline int \
+nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
+{ \
+ return !!(vdesc->vd_flags2 & (1UL << NILFS_VDESC_##flag)); \
+}
+
+NILFS_VDESC_FNS(DATA, data)
+NILFS_VDESC_FNS(NODE, node)
+NILFS_VDESC_FNS2(SNAPSHOT, snapshot)
+
/**
* struct nilfs_bdesc - descriptor of disk block number
* @bd_ino: inode number
@@ -922,5 +968,7 @@ struct nilfs_bdesc {
_IOW(NILFS_IOCTL_IDENT, 0x8C, __u64[2])
#define NILFS_IOCTL_SET_SUINFO \
_IOW(NILFS_IOCTL_IDENT, 0x8D, struct nilfs_argv)
+#define NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS \
+ _IOW(NILFS_IOCTL_IDENT, 0x8F, struct nilfs_argv)
#endif /* _LINUX_NILFS_FS_H */
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 5/6] nilfs2: add counting of live blocks for blocks that are overwritten
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
` (3 preceding siblings ...)
2014-03-16 10:47 ` [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries Andreas Rohner
@ 2014-03-16 10:47 ` Andreas Rohner
[not found] ` <25dd8a8bb6943ffa3e0663848363135585a48109.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 6/6] nilfs2: add counting of live blocks for deleted files Andreas Rohner
` (2 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:47 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
After a Block is written to disk, the buffer_head is never mapped to
that location on disk. By simply using map_bh() after writing the block
the origin of overwritten blocks can be determined and the corresponding
segment can be calculated with nilfs_get_segnum_of_block(). Since the
block is now at a new location, the old one is reclaimable. Therefore
the number of live blocks in the segment usage information of the
segment of the previous location of the block needs to be decremented.
This approach also works for the DAT file and other metadata files.
nilfs_node Blocks have to be treated differently. Also GC blocks have to
be treated separately, because they contain the virtual block number in
the b_blocknr field and their old location is about to be cleaned
anyway. So it is not necessary to decrement the live block counters for
GC blocks.
This patch does not count deleted blocks, when a whole file is deleted.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
fs/nilfs2/dat.c | 58 +++++++++++++++++++++++++++++++++++++++
fs/nilfs2/dat.h | 1 +
fs/nilfs2/inode.c | 2 ++
fs/nilfs2/ioctl.c | 6 +++++
fs/nilfs2/page.h | 6 ++++-
fs/nilfs2/segbuf.c | 25 +++++++++++++++++
fs/nilfs2/segbuf.h | 4 +++
fs/nilfs2/segment.c | 69 +++++++++++++++++++++++++++++++++++++++++++----
include/linux/nilfs2_fs.h | 2 ++
9 files changed, 167 insertions(+), 6 deletions(-)
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 7adb15d..e7b19c40 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -445,6 +445,64 @@ int nilfs_dat_clean_snapshot_flag(struct inode *dat, __u64 vblocknr)
}
/**
+ * nilfs_dat_is_live - checks if the virtual block number is alive
+ * @dat: DAT file inode
+ * @vblocknr: virtual block number
+ *
+ * Description: nilfs_dat_is_live() looks up the DAT entry for @vblocknr and
+ * determines if the corresponding block is alive or not. This check ignores
+ * snapshots and protection periods.
+ *
+ * Return Value: 1 if vblocknr is alive and 0 otherwise. On error, one
+ * of the following negative error codes is returned.
+ *
+ * %-EIO - I/O error.
+ *
+ * %-ENOMEM - Insufficient amount of memory available.
+ *
+ * %-ENOENT - A block number associated with @vblocknr does not exist.
+ */
+int nilfs_dat_is_live(struct inode *dat, __u64 vblocknr)
+{
+ struct buffer_head *entry_bh, *bh;
+ struct nilfs_dat_entry *entry;
+ sector_t blocknr;
+ void *kaddr;
+ int ret;
+
+ ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh);
+ if (ret < 0)
+ return ret;
+
+ if (!nilfs_doing_gc() && buffer_nilfs_redirected(entry_bh)) {
+ bh = nilfs_mdt_get_frozen_buffer(dat, entry_bh);
+ if (bh) {
+ WARN_ON(!buffer_uptodate(bh));
+ brelse(entry_bh);
+ entry_bh = bh;
+ }
+ }
+
+ kaddr = kmap_atomic(entry_bh->b_page);
+ entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr);
+ blocknr = le64_to_cpu(entry->de_blocknr);
+ if (blocknr == 0) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+
+ if (entry->de_end == cpu_to_le64(NILFS_CNO_MAX))
+ ret = 1;
+ else
+ ret = 0;
+out:
+ kunmap_atomic(kaddr);
+ brelse(entry_bh);
+ return ret;
+}
+
+/**
* nilfs_dat_translate - translate a virtual block number to a block number
* @dat: DAT file inode
* @vblocknr: virtual block number
diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
index a528024..51d44c0 100644
--- a/fs/nilfs2/dat.h
+++ b/fs/nilfs2/dat.h
@@ -31,6 +31,7 @@
struct nilfs_palloc_req;
int nilfs_dat_translate(struct inode *, __u64, sector_t *);
+int nilfs_dat_is_live(struct inode *, __u64);
int nilfs_dat_prepare_alloc(struct inode *, struct nilfs_palloc_req *);
void nilfs_dat_commit_alloc(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index b9c5726..c32b896 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -86,6 +86,8 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
int err = 0, ret;
unsigned maxblocks = bh_result->b_size >> inode->i_blkbits;
+ bh_result->b_blocknr = 0;
+
down_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
ret = nilfs_bmap_lookup_contig(ii->i_bmap, blkoff, &blknum, maxblocks);
up_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 0b62bf4..3603394 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -612,6 +612,12 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
brelse(bh);
return -EEXIST;
}
+
+ if (nilfs_vdesc_snapshot(vdesc))
+ set_buffer_nilfs_snapshot(bh);
+ if (nilfs_vdesc_protection_period(vdesc))
+ set_buffer_nilfs_protection_period(bh);
+
list_add_tail(&bh->b_assoc_buffers, buffers);
return 0;
}
diff --git a/fs/nilfs2/page.h b/fs/nilfs2/page.h
index ef30c5c..8c34a31 100644
--- a/fs/nilfs2/page.h
+++ b/fs/nilfs2/page.h
@@ -36,13 +36,17 @@ enum {
BH_NILFS_Volatile,
BH_NILFS_Checked,
BH_NILFS_Redirected,
+ BH_NILFS_Snapshot,
+ BH_NILFS_Protection_Period,
};
BUFFER_FNS(NILFS_Node, nilfs_node) /* nilfs node buffers */
BUFFER_FNS(NILFS_Volatile, nilfs_volatile)
BUFFER_FNS(NILFS_Checked, nilfs_checked) /* buffer is verified */
BUFFER_FNS(NILFS_Redirected, nilfs_redirected) /* redirected to a copy */
-
+BUFFER_FNS(NILFS_Snapshot, nilfs_snapshot) /* belongs to a snapshot */
+BUFFER_FNS(NILFS_Protection_Period, nilfs_protection_period) /* protected by
+ protection period */
int __nilfs_clear_page_dirty(struct page *);
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index dc3a9efd..c72fc37 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -28,6 +28,7 @@
#include <linux/slab.h>
#include "page.h"
#include "segbuf.h"
+#include "sufile.h"
struct nilfs_write_info {
@@ -57,6 +58,8 @@ struct nilfs_segment_buffer *nilfs_segbuf_new(struct super_block *sb)
INIT_LIST_HEAD(&segbuf->sb_segsum_buffers);
INIT_LIST_HEAD(&segbuf->sb_payload_buffers);
segbuf->sb_super_root = NULL;
+ segbuf->sb_su_blocks = 0;
+ segbuf->sb_su_blocks_cancel = 0;
init_completion(&segbuf->sb_bio_event);
atomic_set(&segbuf->sb_err, 0);
@@ -82,6 +85,25 @@ void nilfs_segbuf_map(struct nilfs_segment_buffer *segbuf, __u64 segnum,
segbuf->sb_fseg_end - segbuf->sb_pseg_start + 1;
}
+int nilfs_segbuf_set_sui(struct nilfs_segment_buffer *segbuf,
+ struct the_nilfs *nilfs)
+{
+ struct nilfs_suinfo si;
+ ssize_t err;
+
+ err = nilfs_sufile_get_suinfo(nilfs->ns_sufile, segbuf->sb_segnum, &si,
+ sizeof(si), 1);
+ if (err != 1)
+ return -1;
+
+ if (si.sui_nblocks == 0)
+ si.sui_nblocks = segbuf->sb_pseg_start - segbuf->sb_fseg_start;
+
+ segbuf->sb_su_blocks = si.sui_nblocks;
+ segbuf->sb_su_blocks_cancel = si.sui_nblocks;
+ return 0;
+}
+
/**
* nilfs_segbuf_map_cont - map a new log behind a given log
* @segbuf: new segment buffer
@@ -450,6 +472,9 @@ static int nilfs_segbuf_submit_bh(struct nilfs_segment_buffer *segbuf,
len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh));
if (len == bh->b_size) {
+ lock_buffer(bh);
+ map_bh(bh, segbuf->sb_super, wi->blocknr + wi->end);
+ unlock_buffer(bh);
wi->end++;
return 0;
}
diff --git a/fs/nilfs2/segbuf.h b/fs/nilfs2/segbuf.h
index b04f08c..482bbad 100644
--- a/fs/nilfs2/segbuf.h
+++ b/fs/nilfs2/segbuf.h
@@ -83,6 +83,8 @@ struct nilfs_segment_buffer {
sector_t sb_fseg_start, sb_fseg_end;
sector_t sb_pseg_start;
unsigned sb_rest_blocks;
+ __u32 sb_su_blocks_cancel;
+ __s64 sb_su_blocks;
/* Buffers */
struct list_head sb_segsum_buffers;
@@ -122,6 +124,8 @@ void nilfs_segbuf_map(struct nilfs_segment_buffer *, __u64, unsigned long,
struct the_nilfs *);
void nilfs_segbuf_map_cont(struct nilfs_segment_buffer *segbuf,
struct nilfs_segment_buffer *prev);
+int nilfs_segbuf_set_sui(struct nilfs_segment_buffer *segbuf,
+ struct the_nilfs *nilfs);
void nilfs_segbuf_set_next_segnum(struct nilfs_segment_buffer *, __u64,
struct the_nilfs *);
int nilfs_segbuf_reset(struct nilfs_segment_buffer *, unsigned, time_t, __u64);
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..5d98a1c 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1257,6 +1257,10 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
}
nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
+ err = nilfs_segbuf_set_sui(segbuf, nilfs);
+ if (err)
+ goto failed;
+
BUG_ON(!list_empty(&sci->sc_segbufs));
list_add_tail(&segbuf->sb_list, &sci->sc_segbufs);
sci->sc_segbuf_nblocks = segbuf->sb_rest_blocks;
@@ -1306,6 +1310,10 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
segbuf->sb_sum.seg_seq = prev->sb_sum.seg_seq + 1;
nilfs_segbuf_set_next_segnum(segbuf, nextnextnum, nilfs);
+ err = nilfs_segbuf_set_sui(segbuf, nilfs);
+ if (err)
+ goto failed;
+
list_add_tail(&segbuf->sb_list, &list);
prev = segbuf;
}
@@ -1368,8 +1376,7 @@ static void nilfs_segctor_update_segusage(struct nilfs_sc_info *sci,
int ret;
list_for_each_entry(segbuf, &sci->sc_segbufs, sb_list) {
- live_blocks = segbuf->sb_sum.nblocks +
- (segbuf->sb_pseg_start - segbuf->sb_fseg_start);
+ live_blocks = segbuf->sb_sum.nfileblk + segbuf->sb_su_blocks;
ret = nilfs_sufile_set_segment_usage(sufile, segbuf->sb_segnum,
live_blocks,
sci->sc_seg_ctime);
@@ -1383,9 +1390,9 @@ static void nilfs_cancel_segusage(struct list_head *logs, struct inode *sufile)
int ret;
segbuf = NILFS_FIRST_SEGBUF(logs);
+
ret = nilfs_sufile_set_segment_usage(sufile, segbuf->sb_segnum,
- segbuf->sb_pseg_start -
- segbuf->sb_fseg_start, 0);
+ segbuf->sb_su_blocks_cancel, 0);
WARN_ON(ret); /* always succeed because the segusage is dirty */
list_for_each_entry_continue(segbuf, logs, sb_list) {
@@ -1477,7 +1484,9 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
struct nilfs_segment_buffer *segbuf,
int mode)
{
+ struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
struct inode *inode = NULL;
+ struct nilfs_inode_info *ii;
sector_t blocknr;
unsigned long nfinfo = segbuf->sb_sum.nfinfo;
unsigned long nblocks = 0, ndatablk = 0;
@@ -1487,7 +1496,9 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
union nilfs_binfo binfo;
struct buffer_head *bh, *bh_org;
ino_t ino = 0;
- int err = 0;
+ int gc_inode = 0, err = 0;
+ __u64 segnum, prev_segnum = 0, dectime = 0, maxdectime = 0;
+ __u32 blkcount = 0;
if (!nfinfo)
goto out;
@@ -1508,6 +1519,17 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
inode = bh->b_page->mapping->host;
+ ii = NILFS_I(inode);
+ gc_inode = test_bit(NILFS_I_GCINODE, &ii->i_state);
+ dectime = sci->sc_seg_ctime;
+ /* no update of lastdec necessary */
+ if (ino == NILFS_DAT_INO || ino == NILFS_SUFILE_INO ||
+ ino == NILFS_CPFILE_INO)
+ dectime = 0;
+
+ if (dectime > maxdectime)
+ maxdectime = dectime;
+
if (mode == SC_LSEG_DSYNC)
sc_op = &nilfs_sc_dsync_ops;
else if (ino == NILFS_DAT_INO)
@@ -1515,6 +1537,39 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
else /* file blocks */
sc_op = &nilfs_sc_file_ops;
}
+
+ segnum = nilfs_get_segnum_of_block(nilfs, bh->b_blocknr);
+ if (!gc_inode && bh->b_blocknr > 0 &&
+ (ino == NILFS_DAT_INO || !buffer_nilfs_node(bh)) &&
+ segnum < nilfs->ns_nsegments) {
+
+ if (segnum != prev_segnum) {
+ if (blkcount) {
+ nilfs_sufile_add_segment_usage(
+ nilfs->ns_sufile,
+ prev_segnum,
+ -((__s64)blkcount),
+ maxdectime);
+ }
+ prev_segnum = segnum;
+ blkcount = 0;
+ maxdectime = dectime;
+ }
+
+
+ if (segnum == segbuf->sb_segnum)
+ segbuf->sb_su_blocks--;
+ else
+ ++blkcount;
+ } else if (gc_inode && bh->b_blocknr > 0) {
+ /* check again if gc blocks are alive */
+ if (!buffer_nilfs_snapshot(bh) &&
+ (buffer_nilfs_protection_period(bh) ||
+ !nilfs_dat_is_live(nilfs->ns_dat,
+ bh->b_blocknr)))
+ segbuf->sb_su_blocks--;
+ }
+
bh_org = bh;
get_bh(bh_org);
err = nilfs_bmap_assign(NILFS_I(inode)->i_bmap, &bh, blocknr,
@@ -1538,6 +1593,10 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
} else if (ndatablk > 0)
ndatablk--;
}
+
+ if (blkcount)
+ nilfs_sufile_add_segment_usage(nilfs->ns_sufile, prev_segnum,
+ -((__s64)blkcount), maxdectime);
out:
return 0;
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index 30ddc86..e05793a 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -885,6 +885,7 @@ enum {
};
enum {
NILFS_VDESC_SNAPSHOT,
+ NILFS_VDESC_PROTECTION_PERIOD,
__NR_NILFS_VDESC_FIELDS,
/* ... */
};
@@ -921,6 +922,7 @@ nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
NILFS_VDESC_FNS(DATA, data)
NILFS_VDESC_FNS(NODE, node)
NILFS_VDESC_FNS2(SNAPSHOT, snapshot)
+NILFS_VDESC_FNS2(PROTECTION_PERIOD, protection_period)
/**
* struct nilfs_bdesc - descriptor of disk block number
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 6/6] nilfs2: add counting of live blocks for deleted files
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
` (4 preceding siblings ...)
2014-03-16 10:47 ` [PATCH 5/6] nilfs2: add counting of live blocks for blocks that are overwritten Andreas Rohner
@ 2014-03-16 10:47 ` Andreas Rohner
2014-03-16 10:49 ` [PATCH 1/4] nilfs-utils: remove reliance on sui_nblocks to read segment Andreas Rohner
2014-03-16 11:01 ` [PATCH 0/6] nilfs2: implement tracking of live blocks Andreas Rohner
7 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:47 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
If a file is deleted, then the entries of its blocks in the DAT-File
need to be updated. So everytime a file is deleted
nilfs_dat_commit_end() is called for every block to set de_end to the
current checkpoint number. So it is the perfect hook to insert logic
that counts live blocks. If the file is deleted, then the blocks are
reclaimable and the number of live blocks for the corresponding
segments must be decremented. This patch adds code to
nilfs_dat_commit_end() that decrements the number of live blocks under
certain conditions.
One condition is, that the block must not belong to the SUFILE, because
that would lead to a deadlock. When nilfs_dat_commit_end() is called the
bmaps b_sem is already held, but nilfs_sufile_add_segment_usage() has to
lock that same lock for the SUFILE, to decrement the number of live
blocks. Secondly the blocks must only be counted if
nilfs_dat_commit_end() is called from a file deletion operation,
because overwritten blocks are already counted somewhere else.
With the above changes the code does not pass the lock dependency
checks, because all the locks have the same class and the order in which
the locks are taken is different. Usually it is:
1. down_write(&NILFS_MDT(sufile)->mi_sem);
2. down_write(&bmap->b_sem);
Now it can also be reversed, which leads to failed checks:
1. down_write(&bmap->b_sem); /* lock of a file other than SUFILE */
2. down_write(&NILFS_MDT(sufile)->mi_sem);
But this is safe as long as the first lock down_write(&bmap->b_sem)
doesn't belong to the SUFILE. So the warnings can be resolved, by adding
an extra lock class for the SUFILE and the code is safe, because the
SUFILE is excluded from being counted.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
fs/nilfs2/bmap.c | 8 +++++++-
fs/nilfs2/bmap.h | 2 +-
fs/nilfs2/btree.c | 3 ++-
fs/nilfs2/dat.c | 18 ++++++++++++++----
fs/nilfs2/dat.h | 4 ++--
fs/nilfs2/direct.c | 3 ++-
fs/nilfs2/mdt.c | 5 ++++-
7 files changed, 32 insertions(+), 11 deletions(-)
diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
index aadbd0b..ecd62ba 100644
--- a/fs/nilfs2/bmap.c
+++ b/fs/nilfs2/bmap.c
@@ -467,6 +467,7 @@ __u64 nilfs_bmap_find_target_in_group(const struct nilfs_bmap *bmap)
static struct lock_class_key nilfs_bmap_dat_lock_key;
static struct lock_class_key nilfs_bmap_mdt_lock_key;
+static struct lock_class_key nilfs_bmap_sufile_lock_key;
/**
* nilfs_bmap_read - read a bmap from an inode
@@ -498,12 +499,17 @@ int nilfs_bmap_read(struct nilfs_bmap *bmap, struct nilfs_inode *raw_inode)
lockdep_set_class(&bmap->b_sem, &nilfs_bmap_dat_lock_key);
break;
case NILFS_CPFILE_INO:
- case NILFS_SUFILE_INO:
bmap->b_ptr_type = NILFS_BMAP_PTR_VS;
bmap->b_last_allocated_key = 0;
bmap->b_last_allocated_ptr = NILFS_BMAP_INVALID_PTR;
lockdep_set_class(&bmap->b_sem, &nilfs_bmap_mdt_lock_key);
break;
+ case NILFS_SUFILE_INO:
+ bmap->b_ptr_type = NILFS_BMAP_PTR_VS;
+ bmap->b_last_allocated_key = 0;
+ bmap->b_last_allocated_ptr = NILFS_BMAP_INVALID_PTR;
+ lockdep_set_class(&bmap->b_sem, &nilfs_bmap_sufile_lock_key);
+ break;
case NILFS_IFILE_INO:
lockdep_set_class(&bmap->b_sem, &nilfs_bmap_mdt_lock_key);
/* Fall through */
diff --git a/fs/nilfs2/bmap.h b/fs/nilfs2/bmap.h
index b89e680..f09009c 100644
--- a/fs/nilfs2/bmap.h
+++ b/fs/nilfs2/bmap.h
@@ -223,7 +223,7 @@ static inline void nilfs_bmap_commit_end_ptr(struct nilfs_bmap *bmap,
{
if (dat)
nilfs_dat_commit_end(dat, &req->bpr_req,
- bmap->b_ptr_type == NILFS_BMAP_PTR_VS);
+ bmap->b_ptr_type == NILFS_BMAP_PTR_VS, 1);
}
static inline void nilfs_bmap_abort_end_ptr(struct nilfs_bmap *bmap,
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index b2e3ff3..7365cb4 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -1851,7 +1851,8 @@ static void nilfs_btree_commit_update_v(struct nilfs_bmap *btree,
nilfs_dat_commit_update(dat, &path[level].bp_oldreq.bpr_req,
&path[level].bp_newreq.bpr_req,
- btree->b_ptr_type == NILFS_BMAP_PTR_VS);
+ btree->b_ptr_type == NILFS_BMAP_PTR_VS,
+ buffer_nilfs_node(path[level].bp_bh));
if (buffer_nilfs_node(path[level].bp_bh)) {
nilfs_btnode_commit_change_key(
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index e7b19c40..f465cbf 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -188,12 +188,13 @@ int nilfs_dat_prepare_end(struct inode *dat, struct nilfs_palloc_req *req)
}
void nilfs_dat_commit_end(struct inode *dat, struct nilfs_palloc_req *req,
- int dead)
+ int dead, int count_blocks)
{
struct nilfs_dat_entry *entry;
__u64 start, end;
sector_t blocknr;
void *kaddr;
+ struct the_nilfs *nilfs;
kaddr = kmap_atomic(req->pr_entry_bh->b_page);
entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
@@ -210,8 +211,16 @@ void nilfs_dat_commit_end(struct inode *dat, struct nilfs_palloc_req *req,
if (blocknr == 0)
nilfs_dat_commit_free(dat, req);
- else
+ else {
nilfs_dat_commit_entry(dat, req);
+
+ if (!dead && count_blocks) {
+ nilfs = dat->i_sb->s_fs_info;
+ nilfs_sufile_add_segment_usage(nilfs->ns_sufile,
+ nilfs_get_segnum_of_block(nilfs, blocknr), -1,
+ nilfs->ns_ctime);
+ }
+ }
}
void nilfs_dat_abort_end(struct inode *dat, struct nilfs_palloc_req *req)
@@ -250,9 +259,10 @@ int nilfs_dat_prepare_update(struct inode *dat,
void nilfs_dat_commit_update(struct inode *dat,
struct nilfs_palloc_req *oldreq,
- struct nilfs_palloc_req *newreq, int dead)
+ struct nilfs_palloc_req *newreq,
+ int dead, int count_blocks)
{
- nilfs_dat_commit_end(dat, oldreq, dead);
+ nilfs_dat_commit_end(dat, oldreq, dead, count_blocks);
nilfs_dat_commit_alloc(dat, newreq);
}
diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
index 51d44c0..3dc8de8 100644
--- a/fs/nilfs2/dat.h
+++ b/fs/nilfs2/dat.h
@@ -40,12 +40,12 @@ int nilfs_dat_prepare_start(struct inode *, struct nilfs_palloc_req *);
void nilfs_dat_commit_start(struct inode *, struct nilfs_palloc_req *,
sector_t);
int nilfs_dat_prepare_end(struct inode *, struct nilfs_palloc_req *);
-void nilfs_dat_commit_end(struct inode *, struct nilfs_palloc_req *, int);
+void nilfs_dat_commit_end(struct inode *, struct nilfs_palloc_req *, int, int);
void nilfs_dat_abort_end(struct inode *, struct nilfs_palloc_req *);
int nilfs_dat_prepare_update(struct inode *, struct nilfs_palloc_req *,
struct nilfs_palloc_req *);
void nilfs_dat_commit_update(struct inode *, struct nilfs_palloc_req *,
- struct nilfs_palloc_req *, int);
+ struct nilfs_palloc_req *, int, int);
void nilfs_dat_abort_update(struct inode *, struct nilfs_palloc_req *,
struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/direct.c b/fs/nilfs2/direct.c
index 82f4865..c432484 100644
--- a/fs/nilfs2/direct.c
+++ b/fs/nilfs2/direct.c
@@ -272,7 +272,8 @@ static int nilfs_direct_propagate(struct nilfs_bmap *bmap,
if (ret < 0)
return ret;
nilfs_dat_commit_update(dat, &oldreq, &newreq,
- bmap->b_ptr_type == NILFS_BMAP_PTR_VS);
+ bmap->b_ptr_type == NILFS_BMAP_PTR_VS,
+ 0);
set_buffer_nilfs_volatile(bh);
nilfs_direct_set_ptr(bmap, key, newreq.pr_entry_nr);
} else
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index c4dcd1d..1aa3cc5 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -414,7 +414,7 @@ static const struct address_space_operations def_mdt_aops = {
static const struct inode_operations def_mdt_iops;
static const struct file_operations def_mdt_fops;
-
+static struct lock_class_key nilfs_mdt_mi_sufile_lock_key;
int nilfs_mdt_init(struct inode *inode, gfp_t gfp_mask, size_t objsz)
{
@@ -427,6 +427,9 @@ int nilfs_mdt_init(struct inode *inode, gfp_t gfp_mask, size_t objsz)
init_rwsem(&mi->mi_sem);
inode->i_private = mi;
+ if (inode->i_ino == NILFS_SUFILE_INO)
+ lockdep_set_class(&mi->mi_sem, &nilfs_mdt_mi_sufile_lock_key);
+
inode->i_mode = S_IFREG;
mapping_set_gfp_mask(inode->i_mapping, gfp_mask);
inode->i_mapping->backing_dev_info = inode->i_sb->s_bdi;
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 1/4] nilfs-utils: remove reliance on sui_nblocks to read segment
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
` (5 preceding siblings ...)
2014-03-16 10:47 ` [PATCH 6/6] nilfs2: add counting of live blocks for deleted files Andreas Rohner
@ 2014-03-16 10:49 ` Andreas Rohner
[not found] ` <36b7f57861b69c7fdb9d9e54a21df6f5c7f21061.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 11:01 ` [PATCH 0/6] nilfs2: implement tracking of live blocks Andreas Rohner
7 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:49 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
Since sui_nblocks is reused to represent the number of live blocks in a
segment it cannot be used any more to mark the end of the segment.
Instead the sequence number of the partial segments is checked. The
sequence number of partial segments should all be the same. The usual
CRC checks should be enough to reliably determine the end of a segment.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
include/nilfs.h | 1 +
lib/gc.c | 5 ++---
lib/nilfs.c | 4 +++-
3 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/nilfs.h b/include/nilfs.h
index fab8ff2..05cfe3b 100644
--- a/include/nilfs.h
+++ b/include/nilfs.h
@@ -185,6 +185,7 @@ struct nilfs_psegment {
size_t p_maxblocks;
size_t p_blksize;
__u32 p_seed;
+ __u64 p_seq;
};
/**
diff --git a/lib/gc.c b/lib/gc.c
index 453acf2..a165a5c 100644
--- a/lib/gc.c
+++ b/lib/gc.c
@@ -273,9 +273,8 @@ static ssize_t nilfs_acc_blocks(struct nilfs *nilfs,
return -1;
continue;
}
- ret = nilfs_acc_blocks_segment(
- nilfs, segnums[i], segment, si.sui_nblocks,
- vdescv, bdescv);
+ ret = nilfs_acc_blocks_segment(nilfs, segnums[i], segment,
+ nilfs_get_blocks_per_segment(nilfs), vdescv, bdescv);
if (nilfs_put_segment(nilfs, segment) < 0 || ret < 0)
return -1;
i++;
diff --git a/lib/nilfs.c b/lib/nilfs.c
index 65bf7d5..e8f5c96 100644
--- a/lib/nilfs.c
+++ b/lib/nilfs.c
@@ -900,7 +900,8 @@ static int nilfs_psegment_is_valid(const struct nilfs_psegment *pseg)
{
int offset;
- if (le32_to_cpu(pseg->p_segsum->ss_magic) != NILFS_SEGSUM_MAGIC)
+ if (le32_to_cpu(pseg->p_segsum->ss_magic) != NILFS_SEGSUM_MAGIC ||
+ le64_to_cpu(pseg->p_segsum->ss_seq) != pseg->p_seq)
return 0;
offset = sizeof(pseg->p_segsum->ss_datasum) +
@@ -928,6 +929,7 @@ void nilfs_psegment_init(struct nilfs_psegment *pseg, __u64 segnum,
pseg->p_seed = le32_to_cpu(nilfs->n_sb->s_crc_seed);
pseg->p_segsum = seg + blkoff * pseg->p_blksize;
+ pseg->p_seq = le64_to_cpu(pseg->p_segsum->ss_seq);
pseg->p_blocknr = pseg->p_segblocknr;
}
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 2/4] nilfs-utils: add cost-benefit and greedy policies
[not found] ` <36b7f57861b69c7fdb9d9e54a21df6f5c7f21061.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-16 10:49 ` Andreas Rohner
[not found] ` <cc43be2e6bba5367fd2982dc0df5255b884bdace.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:49 ` [PATCH 3/4] nilfs-utils: add support for nilfs_clean_snapshot_flags() Andreas Rohner
2014-03-16 10:49 ` [PATCH 4/4] nilfs-utils: add extra flags to nilfs_vdesc and update sui_nblocks Andreas Rohner
2 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:49 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
This patch implements the cost-benefit and greedy GC policies. These are
well known policies for log-structured file systems [1].
* Greedy:
Select the segments with the most free space.
* Cost-Benefit:
Perform a cost-benefit analysis, whereby the free space gained is
weighed against the cost of collecting the segment.
Since especially cost-benefit needed more information than was available
in nilfs_suinfo, a few extra parameters were added to the policy
callback function prototype. The policy threshold was removed, since it
served no real purpose. The flag p_comparison was added to indicate how
the importance values should be interpreted. For example for the
timestamp policy smaller values mean older timestamps, which is better.
For greedy and cost-benefit on the other hand higher values are better.
nilfs_cleanerd_select_segments() was updated accordingly.
[1] Mendel Rosenblum and John K. Ousterhout. The design and implementa-
tion of a log-structured file system. ACM Trans. Comput. Syst.,
10(1):26–52, February 1992.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
include/nilfs2_fs.h | 9 ++++-
sbin/cleanerd/cldconfig.c | 100 +++++++++++++++++++++++++++++++++++++++++++---
sbin/cleanerd/cldconfig.h | 18 +++++----
sbin/cleanerd/cleanerd.c | 56 ++++++++++++++++----------
4 files changed, 149 insertions(+), 34 deletions(-)
diff --git a/include/nilfs2_fs.h b/include/nilfs2_fs.h
index a16ad4c..967c2af 100644
--- a/include/nilfs2_fs.h
+++ b/include/nilfs2_fs.h
@@ -483,7 +483,7 @@ struct nilfs_dat_entry {
__le64 de_blocknr;
__le64 de_start;
__le64 de_end;
- __le64 de_rsv;
+ __le64 de_ss;
};
/**
@@ -612,11 +612,13 @@ struct nilfs_cpfile_header {
* @su_lastmod: last modified timestamp
* @su_nblocks: number of blocks in segment
* @su_flags: flags
+ * @su_lastdec: last decrement of su_nblocks timestamp
*/
struct nilfs_segment_usage {
__le64 su_lastmod;
__le32 su_nblocks;
__le32 su_flags;
+ __le64 su_lastdec;
};
/* segment usage flag */
@@ -659,6 +661,7 @@ nilfs_segment_usage_set_clean(struct nilfs_segment_usage *su)
su->su_lastmod = cpu_to_le64(0);
su->su_nblocks = cpu_to_le32(0);
su->su_flags = cpu_to_le32(0);
+ su->su_lastdec = cpu_to_le64(0);
}
static inline int
@@ -690,11 +693,13 @@ struct nilfs_sufile_header {
* @sui_lastmod: timestamp of last modification
* @sui_nblocks: number of written blocks in segment
* @sui_flags: segment usage flags
+ * @sui_lastdec: last decrement of sui_nblocks timestamp
*/
struct nilfs_suinfo {
__u64 sui_lastmod;
__u32 sui_nblocks;
__u32 sui_flags;
+ __u64 sui_lastdec;
};
#define NILFS_SUINFO_FNS(flag, name) \
@@ -732,6 +737,7 @@ enum {
NILFS_SUINFO_UPDATE_LASTMOD,
NILFS_SUINFO_UPDATE_NBLOCKS,
NILFS_SUINFO_UPDATE_FLAGS,
+ NILFS_SUINFO_UPDATE_LASTDEC,
__NR_NILFS_SUINFO_UPDATE_FIELDS,
};
@@ -755,6 +761,7 @@ nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup) \
NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
+NILFS_SUINFO_UPDATE_FNS(LASTDEC, lastdec)
enum {
NILFS_CHECKPOINT,
diff --git a/sbin/cleanerd/cldconfig.c b/sbin/cleanerd/cldconfig.c
index c8b197b..ade974a 100644
--- a/sbin/cleanerd/cldconfig.c
+++ b/sbin/cleanerd/cldconfig.c
@@ -380,7 +380,10 @@ nilfs_cldconfig_handle_clean_check_interval(struct nilfs_cldconfig *config,
}
static unsigned long long
-nilfs_cldconfig_selection_policy_timestamp(const struct nilfs_suinfo *si)
+nilfs_cldconfig_selection_policy_timestamp(struct nilfs *nilfs,
+ const struct nilfs_sustat *sustat,
+ const struct nilfs_suinfo *si,
+ __u64 prottime)
{
return si->sui_lastmod;
}
@@ -391,14 +394,101 @@ nilfs_cldconfig_handle_selection_policy_timestamp(struct nilfs_cldconfig *config
{
config->cf_selection_policy.p_importance =
NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE;
- config->cf_selection_policy.p_threshold =
- NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD;
+ config->cf_selection_policy.p_comparison =
+ NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER;
+ return 0;
+}
+
+static unsigned long long
+nilfs_cldconfig_selection_policy_greedy(struct nilfs *nilfs,
+ const struct nilfs_sustat *sustat,
+ const struct nilfs_suinfo *si,
+ __u64 prottime)
+{
+ __u32 value, max_blocks = nilfs_get_blocks_per_segment(nilfs);
+
+ if (max_blocks < si->sui_nblocks)
+ return 0;
+
+ value = max_blocks - si->sui_nblocks;
+
+ /*
+ * the value of sui_nblocks is probably not accurate
+ * because blocks inside the protection period are not
+ * considered to be dead
+ */
+ if (si->sui_lastdec >= prottime)
+ value >>= 4;
+
+ return value;
+}
+
+static int
+nilfs_cldconfig_handle_selection_policy_greedy(struct nilfs_cldconfig *config,
+ char **tokens, size_t ntoks)
+{
+ config->cf_selection_policy.p_importance =
+ nilfs_cldconfig_selection_policy_greedy;
+ config->cf_selection_policy.p_comparison =
+ NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER;
+ return 0;
+}
+
+static unsigned long long
+nilfs_cldconfig_selection_policy_cost_benefit(struct nilfs *nilfs,
+ const struct nilfs_sustat *sustat,
+ const struct nilfs_suinfo *si,
+ __u64 prottime)
+{
+ __u32 free_blocks, cleaning_cost;
+ unsigned long long value, age;
+
+ free_blocks = nilfs_get_blocks_per_segment(nilfs) - si->sui_nblocks;
+ /* read the whole segment + write the live blocks */
+ cleaning_cost = 2 * si->sui_nblocks;
+ /*
+ * multiply by 1000 to convert age to milliseconds
+ * (higher precision for division)
+ */
+ age = (sustat->ss_nongc_ctime - si->sui_lastmod) * 1000;
+
+ if (sustat->ss_nongc_ctime < si->sui_lastmod)
+ return 0;
+
+ if (cleaning_cost == 0)
+ cleaning_cost = 1;
+
+
+ value = (age * free_blocks) / cleaning_cost;
+
+ /*
+ * the value of sui_nblocks is probably not accurate
+ * because blocks inside the protection period are not
+ * considered to be dead
+ */
+ if (si->sui_lastdec >= prottime)
+ value >>= 4;
+
+ return value;
+}
+
+static int
+nilfs_cldconfig_handle_selection_policy_cost_benefit(
+ struct nilfs_cldconfig *config,
+ char **tokens, size_t ntoks)
+{
+ config->cf_selection_policy.p_importance =
+ nilfs_cldconfig_selection_policy_cost_benefit;
+ config->cf_selection_policy.p_comparison =
+ NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER;
return 0;
}
static const struct nilfs_cldconfig_polhandle
nilfs_cldconfig_polhandle_table[] = {
{"timestamp", nilfs_cldconfig_handle_selection_policy_timestamp},
+ {"greedy", nilfs_cldconfig_handle_selection_policy_greedy},
+ {"cost-benefit", nilfs_cldconfig_handle_selection_policy_cost_benefit},
};
#define NILFS_CLDCONFIG_NPOLHANDLES \
@@ -688,8 +778,8 @@ static void nilfs_cldconfig_set_default(struct nilfs_cldconfig *config,
config->cf_selection_policy.p_importance =
NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE;
- config->cf_selection_policy.p_threshold =
- NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD;
+ config->cf_selection_policy.p_comparison =
+ NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER;
config->cf_protection_period.tv_sec = NILFS_CLDCONFIG_PROTECTION_PERIOD;
config->cf_protection_period.tv_usec = 0;
diff --git a/sbin/cleanerd/cldconfig.h b/sbin/cleanerd/cldconfig.h
index 0a598d5..95d2fde 100644
--- a/sbin/cleanerd/cldconfig.h
+++ b/sbin/cleanerd/cldconfig.h
@@ -30,16 +30,21 @@
#include <sys/time.h>
#include <syslog.h>
+struct nilfs;
+struct nilfs_sustat;
struct nilfs_suinfo;
/**
* struct nilfs_selection_policy -
- * @p_importance:
- * @p_threshold:
+ * @p_importance: function to calculate the importance for the policy
+ * @p_comparison: flag that indicates how to sort the importance
*/
struct nilfs_selection_policy {
- unsigned long long (*p_importance)(const struct nilfs_suinfo *);
- unsigned long long p_threshold;
+ unsigned long long (*p_importance)(struct nilfs *nilfs,
+ const struct nilfs_sustat *sustat,
+ const struct nilfs_suinfo *,
+ __u64 prottime);
+ int p_comparison;
};
/**
@@ -113,7 +118,8 @@ struct nilfs_cldconfig {
#define NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE \
nilfs_cldconfig_selection_policy_timestamp
-#define NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD 0
+#define NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER 0
+#define NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER 1
#define NILFS_CLDCONFIG_PROTECTION_PERIOD 3600
#define NILFS_CLDCONFIG_MIN_CLEAN_SEGMENTS 10
#define NILFS_CLDCONFIG_MIN_CLEAN_SEGMENTS_UNIT NILFS_SIZE_UNIT_PERCENT
@@ -135,8 +141,6 @@ struct nilfs_cldconfig {
#define NILFS_CLDCONFIG_NSEGMENTS_PER_CLEAN_MAX 32
-struct nilfs;
-
int nilfs_cldconfig_read(struct nilfs_cldconfig *config, const char *path,
struct nilfs *nilfs);
diff --git a/sbin/cleanerd/cleanerd.c b/sbin/cleanerd/cleanerd.c
index 17de87b..8df3a07 100644
--- a/sbin/cleanerd/cleanerd.c
+++ b/sbin/cleanerd/cleanerd.c
@@ -417,7 +417,7 @@ static void nilfs_cleanerd_destroy(struct nilfs_cleanerd *cleanerd)
free(cleanerd);
}
-static int nilfs_comp_segimp(const void *elem1, const void *elem2)
+static int nilfs_comp_segimp_asc(const void *elem1, const void *elem2)
{
const struct nilfs_segimp *segimp1 = elem1, *segimp2 = elem2;
@@ -429,6 +429,18 @@ static int nilfs_comp_segimp(const void *elem1, const void *elem2)
return (segimp1->si_segnum < segimp2->si_segnum) ? -1 : 1;
}
+static int nilfs_comp_segimp_desc(const void *elem1, const void *elem2)
+{
+ const struct nilfs_segimp *segimp1 = elem1, *segimp2 = elem2;
+
+ if (segimp1->si_importance > segimp2->si_importance)
+ return -1;
+ else if (segimp1->si_importance < segimp2->si_importance)
+ return 1;
+
+ return (segimp1->si_segnum < segimp2->si_segnum) ? -1 : 1;
+}
+
static int nilfs_cleanerd_automatic_suspend(struct nilfs_cleanerd *cleanerd)
{
return cleanerd->config.cf_min_clean_segments > 0;
@@ -579,7 +591,7 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
__u64 segnum;
size_t count, nsegs;
ssize_t nssegs, n;
- unsigned long long imp, thr;
+ unsigned long long imp;
int i;
nsegs = nilfs_cleanerd_ncleansegs(cleanerd);
@@ -600,11 +612,8 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
prottime = tv2.tv_sec;
oldest = tv.tv_sec;
- /* The segments that have larger importance than thr are not
- * selected. */
- thr = (config->cf_selection_policy.p_threshold != 0) ?
- config->cf_selection_policy.p_threshold :
- sustat->ss_nongc_ctime;
+ /* sui_lastdec may not be set by nilfs_get_suinfo */
+ memset(si, 0, sizeof(si));
for (segnum = 0; segnum < sustat->ss_nsegs; segnum += n) {
count = min_t(__u64, sustat->ss_nsegs - segnum,
@@ -615,22 +624,23 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
goto out;
}
for (i = 0; i < n; i++) {
- if (!nilfs_suinfo_reclaimable(&si[i]))
+ if (!nilfs_suinfo_reclaimable(&si[i]) ||
+ si[i].sui_lastmod >= sustat->ss_nongc_ctime)
continue;
- imp = config->cf_selection_policy.p_importance(&si[i]);
- if (imp < thr) {
- if (si[i].sui_lastmod < oldest)
- oldest = si[i].sui_lastmod;
- if (si[i].sui_lastmod < prottime) {
- sm = nilfs_vector_get_new_element(smv);
- if (sm == NULL) {
- nssegs = -1;
- goto out;
- }
- sm->si_segnum = segnum + i;
- sm->si_importance = imp;
+ imp = config->cf_selection_policy.p_importance(nilfs,
+ sustat, &si[i], prottime);
+
+ if (si[i].sui_lastmod < oldest)
+ oldest = si[i].sui_lastmod;
+ if (si[i].sui_lastmod < prottime) {
+ sm = nilfs_vector_get_new_element(smv);
+ if (sm == NULL) {
+ nssegs = -1;
+ goto out;
}
+ sm->si_segnum = segnum + i;
+ sm->si_importance = imp;
}
}
if (n == 0) {
@@ -642,7 +652,11 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
break;
}
}
- nilfs_vector_sort(smv, nilfs_comp_segimp);
+ if (config->cf_selection_policy.p_comparison ==
+ NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER)
+ nilfs_vector_sort(smv, nilfs_comp_segimp_asc);
+ else
+ nilfs_vector_sort(smv, nilfs_comp_segimp_desc);
nssegs = (nilfs_vector_get_size(smv) < nsegs) ?
nilfs_vector_get_size(smv) : nsegs;
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 3/4] nilfs-utils: add support for nilfs_clean_snapshot_flags()
[not found] ` <36b7f57861b69c7fdb9d9e54a21df6f5c7f21061.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:49 ` [PATCH 2/4] nilfs-utils: add cost-benefit and greedy policies Andreas Rohner
@ 2014-03-16 10:49 ` Andreas Rohner
2014-03-16 10:49 ` [PATCH 4/4] nilfs-utils: add extra flags to nilfs_vdesc and update sui_nblocks Andreas Rohner
2 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:49 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
This ioctl enables the userspace GC to perform a cleanup operation after
setting the number of blocks with NILFS_IOCTL_SET_SUINFO. It sets DAT
entries with de_ss values of NILFS_CNO_MAX to 0. NILFS_CNO_MAX
indicates, that the corresponding block belongs to some snapshot, but
was already decremented by a previous deletion operation. If the segment
usage info is changed with NILFS_IOCTL_SET_SUINFO and the number of
blocks is updated, then these blocks would never be decremented and
there are scenarios where the corresponding segments would starve (never
be cleaned). To prevent that the value of de_ss must be set to 0, so
that it can be decremented again, should the snapshot be deleted in the
future.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
include/nilfs.h | 2 ++
include/nilfs2_fs.h | 2 ++
lib/gc.c | 6 +++++-
lib/nilfs.c | 23 +++++++++++++++++++++++
4 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/include/nilfs.h b/include/nilfs.h
index 05cfe3b..bd134be 100644
--- a/include/nilfs.h
+++ b/include/nilfs.h
@@ -313,6 +313,8 @@ ssize_t nilfs_get_bdescs(const struct nilfs *, struct nilfs_bdesc *, size_t);
int nilfs_clean_segments(struct nilfs *, struct nilfs_vdesc *, size_t,
struct nilfs_period *, size_t, __u64 *, size_t,
struct nilfs_bdesc *, size_t, __u64 *, size_t);
+int nilfs_clean_snapshot_flags(struct nilfs *nilfs,
+ struct nilfs_vdesc *vdescs, size_t nvdescs);
int nilfs_sync(const struct nilfs *, nilfs_cno_t *);
int nilfs_resize(struct nilfs *nilfs, off_t size);
int nilfs_set_alloc_range(struct nilfs *nilfs, off_t start, off_t end);
diff --git a/include/nilfs2_fs.h b/include/nilfs2_fs.h
index 967c2af..cb02739 100644
--- a/include/nilfs2_fs.h
+++ b/include/nilfs2_fs.h
@@ -918,5 +918,7 @@ struct nilfs_bdesc {
_IOW(NILFS_IOCTL_IDENT, 0x8C, __u64[2])
#define NILFS_IOCTL_SET_SUINFO \
_IOW(NILFS_IOCTL_IDENT, 0x8D, struct nilfs_argv)
+#define NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS \
+ _IOW(NILFS_IOCTL_IDENT, 0x8F, struct nilfs_argv)
#endif /* _LINUX_NILFS_FS_H */
diff --git a/lib/gc.c b/lib/gc.c
index a165a5c..2338174 100644
--- a/lib/gc.c
+++ b/lib/gc.c
@@ -762,8 +762,12 @@ int nilfs_xreclaim_segment(struct nilfs *nilfs,
ret = nilfs_set_suinfo(nilfs, nilfs_vector_get_data(supv), n);
- if (ret == 0)
+ if (ret == 0) {
+ ret = nilfs_clean_snapshot_flags(nilfs,
+ nilfs_vector_get_data(vdescv),
+ nilfs_vector_get_size(vdescv));
goto out_lock;
+ }
if (ret < 0 && errno != ENOTTY) {
nilfs_gc_logger(LOG_ERR, "cannot set suinfo: %s",
diff --git a/lib/nilfs.c b/lib/nilfs.c
index e8f5c96..b909a23 100644
--- a/lib/nilfs.c
+++ b/lib/nilfs.c
@@ -743,6 +743,29 @@ int nilfs_clean_segments(struct nilfs *nilfs,
}
/**
+ * nilfs_clean_snapshot_flags - cleanup snapshot flags after set_suinfo
+ * @nilfs: nilfs object
+ * @vdescs: array of nilfs_vdesc structs to specify live blocks
+ * @nvdescs: size of @vdescs array (number of items)
+ */
+int nilfs_clean_snapshot_flags(struct nilfs *nilfs,
+ struct nilfs_vdesc *vdescs, size_t nvdescs)
+{
+ struct nilfs_argv argv;
+
+ if (nilfs->n_iocfd < 0) {
+ errno = EBADF;
+ return -1;
+ }
+
+ memset(&argv, 0, sizeof(struct nilfs_argv));
+ argv.v_base = (unsigned long)vdescs;
+ argv.v_nmembs = nvdescs;
+ argv.v_size = sizeof(struct nilfs_vdesc);
+ return ioctl(nilfs->n_iocfd, NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS, &argv);
+}
+
+/**
* nilfs_sync - sync a NILFS file system
* @nilfs: nilfs object
* @cnop: buffer to store the latest checkpoint number in
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 4/4] nilfs-utils: add extra flags to nilfs_vdesc and update sui_nblocks
[not found] ` <36b7f57861b69c7fdb9d9e54a21df6f5c7f21061.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:49 ` [PATCH 2/4] nilfs-utils: add cost-benefit and greedy policies Andreas Rohner
2014-03-16 10:49 ` [PATCH 3/4] nilfs-utils: add support for nilfs_clean_snapshot_flags() Andreas Rohner
@ 2014-03-16 10:49 ` Andreas Rohner
2 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 10:49 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
This patch adds extra flags to nilfs_vdesc that indicate the reason for
which a particular block is considered alive. If it is because of a
snapshot, then the snapshot flag is set, if it is because of the
protection period, then that flag is set.
This information is useful to determine the number of live blocks in a
segment. If a block is part of a snapshot, it is counted as alive, if it
is alive because of the protection period it is counted as reclaimable.
These flags are used both in userspace and by the kernel.
Additionally this patch adds code that calculates the correct number of
live blocks per segment if nilfs_set_suinfo() is used.
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
include/nilfs.h | 6 +++++
include/nilfs2_fs.h | 52 +++++++++++++++++++++++++++++++++++++--
lib/gc.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++------
3 files changed, 119 insertions(+), 10 deletions(-)
diff --git a/include/nilfs.h b/include/nilfs.h
index bd134be..3585f6b 100644
--- a/include/nilfs.h
+++ b/include/nilfs.h
@@ -329,4 +329,10 @@ static inline __u32 nilfs_get_blocks_per_segment(const struct nilfs *nilfs)
return le32_to_cpu(nilfs->n_sb->s_blocks_per_segment);
}
+static inline __u64
+nilfs_get_segnum_of_block(const struct nilfs *nilfs, sector_t blocknr)
+{
+ return blocknr / nilfs_get_blocks_per_segment(nilfs);
+}
+
#endif /* NILFS_H */
diff --git a/include/nilfs2_fs.h b/include/nilfs2_fs.h
index cb02739..7a060b3 100644
--- a/include/nilfs2_fs.h
+++ b/include/nilfs2_fs.h
@@ -859,7 +859,7 @@ struct nilfs_vinfo {
* @vd_blocknr: disk block number
* @vd_offset: logical block offset inside a file
* @vd_flags: flags (data or node block)
- * @vd_pad: padding
+ * @vd_flags2: additional flags
*/
struct nilfs_vdesc {
__u64 vd_ino;
@@ -869,9 +869,57 @@ struct nilfs_vdesc {
__u64 vd_blocknr;
__u64 vd_offset;
__u32 vd_flags;
- __u32 vd_pad;
+ /* vd_flags2 needed because of backwards compatibility */
+ __u32 vd_flags2;
};
+/* vdesc flags */
+enum {
+ NILFS_VDESC_DATA,
+ NILFS_VDESC_NODE,
+ /* ... */
+};
+enum {
+ NILFS_VDESC_SNAPSHOT,
+ NILFS_VDESC_PROTECTION_PERIOD,
+ __NR_NILFS_VDESC_FIELDS,
+ /* ... */
+};
+
+#define NILFS_VDESC_FNS(flag, name) \
+static inline void \
+nilfs_vdesc_set_##name(struct nilfs_vdesc *vdesc) \
+{ \
+ vdesc->vd_flags = NILFS_VDESC_##flag; \
+} \
+static inline int \
+nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
+{ \
+ return vdesc->vd_flags == NILFS_VDESC_##flag; \
+}
+
+#define NILFS_VDESC_FNS2(flag, name) \
+static inline void \
+nilfs_vdesc_set_##name(struct nilfs_vdesc *vdesc) \
+{ \
+ vdesc->vd_flags2 |= (1UL << NILFS_VDESC_##flag); \
+} \
+static inline void \
+nilfs_vdesc_clear_##name(struct nilfs_vdesc *vdesc) \
+{ \
+ vdesc->vd_flags2 &= ~(1UL << NILFS_VDESC_##flag); \
+} \
+static inline int \
+nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
+{ \
+ return !!(vdesc->vd_flags2 & (1UL << NILFS_VDESC_##flag)); \
+}
+
+NILFS_VDESC_FNS(DATA, data)
+NILFS_VDESC_FNS(NODE, node)
+NILFS_VDESC_FNS2(SNAPSHOT, snapshot)
+NILFS_VDESC_FNS2(PROTECTION_PERIOD, protection_period)
+
/**
* struct nilfs_bdesc - descriptor of disk block number
* @bd_ino: inode number
diff --git a/lib/gc.c b/lib/gc.c
index 2338174..2df15f7 100644
--- a/lib/gc.c
+++ b/lib/gc.c
@@ -128,6 +128,7 @@ static int nilfs_acc_blocks_file(struct nilfs_file *file,
return -1;
bdesc->bd_ino = ino;
bdesc->bd_oblocknr = blk.b_blocknr;
+ bdesc->bd_pad = 0;
if (nilfs_block_is_data(&blk)) {
bdesc->bd_offset =
le64_to_cpu(*(__le64 *)blk.b_binfo);
@@ -148,17 +149,19 @@ static int nilfs_acc_blocks_file(struct nilfs_file *file,
vdesc->vd_ino = ino;
vdesc->vd_cno = cno;
vdesc->vd_blocknr = blk.b_blocknr;
+ vdesc->vd_flags = 0;
+ vdesc->vd_flags2 = 0;
if (nilfs_block_is_data(&blk)) {
binfo = blk.b_binfo;
vdesc->vd_vblocknr =
le64_to_cpu(binfo->bi_v.bi_vblocknr);
vdesc->vd_offset =
le64_to_cpu(binfo->bi_v.bi_blkoff);
- vdesc->vd_flags = 0; /* data */
+ nilfs_vdesc_set_data(vdesc);
} else {
vdesc->vd_vblocknr =
le64_to_cpu(*(__le64 *)blk.b_binfo);
- vdesc->vd_flags = 1; /* node */
+ nilfs_vdesc_set_node(vdesc);
}
}
}
@@ -391,7 +394,7 @@ static ssize_t nilfs_get_snapshot(struct nilfs *nilfs, nilfs_cno_t **ssp)
* @n: size of @ss array
* @last_hit: the last snapshot number hit
*/
-static int nilfs_vdesc_is_live(const struct nilfs_vdesc *vdesc,
+static int nilfs_vdesc_is_live(struct nilfs_vdesc *vdesc,
nilfs_cno_t protect, const nilfs_cno_t *ss,
size_t n, nilfs_cno_t *last_hit)
{
@@ -407,18 +410,22 @@ static int nilfs_vdesc_is_live(const struct nilfs_vdesc *vdesc,
return vdesc->vd_period.p_end == NILFS_CNO_MAX;
}
- if (vdesc->vd_period.p_end == NILFS_CNO_MAX ||
- vdesc->vd_period.p_end > protect)
+ if (vdesc->vd_period.p_end == NILFS_CNO_MAX)
return 1;
+ if (vdesc->vd_period.p_end > protect)
+ nilfs_vdesc_set_protection_period(vdesc);
+
if (n == 0 || vdesc->vd_period.p_start > ss[n - 1] ||
vdesc->vd_period.p_end <= ss[0])
- return 0;
+ return nilfs_vdesc_protection_period(vdesc);
/* Try the last hit snapshot number */
if (*last_hit >= vdesc->vd_period.p_start &&
- *last_hit < vdesc->vd_period.p_end)
+ *last_hit < vdesc->vd_period.p_end) {
+ nilfs_vdesc_set_snapshot(vdesc);
return 1;
+ }
low = 0;
high = n - 1;
@@ -434,10 +441,11 @@ static int nilfs_vdesc_is_live(const struct nilfs_vdesc *vdesc,
} else {
/* ss[index] is in the range [p_start, p_end) */
*last_hit = ss[index];
+ nilfs_vdesc_set_snapshot(vdesc);
return 1;
}
}
- return 0;
+ return nilfs_vdesc_protection_period(vdesc);
}
/**
@@ -602,6 +610,47 @@ static int nilfs_toss_bdescs(struct nilfs_vector *bdescv)
}
/**
+ * nilfs_count_live_blocks - returns the number of blocks in segnum
+ * @nilfs: nilfs object
+ * @segnum: segment number
+ * @bdescv: vector object storing (descriptors of) disk block numbers
+ * @vdescv: vector object storing (descriptors of) virtual block numbers
+ */
+static size_t nilfs_count_live_blocks(const struct nilfs *nilfs,
+ __u64 segnum,
+ struct nilfs_vector *vdescv,
+ struct nilfs_vector *bdescv)
+{
+ struct nilfs_vdesc *vdesc;
+ struct nilfs_bdesc *bdesc;
+ int i;
+ size_t res = 0;
+
+ for (i = 0; i < nilfs_vector_get_size(bdescv); i++) {
+ bdesc = nilfs_vector_get_element(bdescv, i);
+ assert(bdesc != NULL);
+
+ if (nilfs_get_segnum_of_block(nilfs, bdesc->bd_blocknr) ==
+ segnum && nilfs_bdesc_is_live(bdesc)) {
+ ++res;
+ }
+ }
+
+ for (i = 0; i < nilfs_vector_get_size(vdescv); i++) {
+ vdesc = nilfs_vector_get_element(vdescv, i);
+ assert(vdesc != NULL);
+
+ if (nilfs_get_segnum_of_block(nilfs, vdesc->vd_blocknr) ==
+ segnum && (nilfs_vdesc_snapshot(vdesc) ||
+ !nilfs_vdesc_protection_period(vdesc))) {
+ ++res;
+ }
+ }
+
+ return res;
+}
+
+/**
* nilfs_xreclaim_segment - reclaim segments (enhanced API)
* @nilfs: nilfs object
* @segnums: array of segment numbers storing selected segments
@@ -757,7 +806,13 @@ int nilfs_xreclaim_segment(struct nilfs *nilfs,
sup->sup_segnum = segnums[i];
sup->sup_flags = 0;
nilfs_suinfo_update_set_lastmod(sup);
+ nilfs_suinfo_update_set_nblocks(sup);
+
sup->sup_sui.sui_lastmod = tv.tv_sec;
+ sup->sup_sui.sui_nblocks =
+ nilfs_count_live_blocks(nilfs,
+ segnums[i], vdescv, bdescv);
+
}
ret = nilfs_set_suinfo(nilfs, nilfs_vector_get_data(supv), n);
--
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 0/6] nilfs2: implement tracking of live blocks
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
` (6 preceding siblings ...)
2014-03-16 10:49 ` [PATCH 1/4] nilfs-utils: remove reliance on sui_nblocks to read segment Andreas Rohner
@ 2014-03-16 11:01 ` Andreas Rohner
[not found] ` <532584A2.8000004-hi6Y0CQ0nG0@public.gmane.org>
7 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 11:01 UTC (permalink / raw)
To: linux-nilfs
On 2014-03-16 11:47, Andreas Rohner wrote:
> Hi,
>
> This patch set implements the tracking of live blocks in segments. This
> information is crucial in implementing better GC policies, because
> now the policies can make informed decisions about which segments have
> the biggest number of reclaimable blocks.
IMPORTANT:
I forgot to mention, that the patches are based on linux-next/master,
because they rely on previous patches that aren't in master yet.
br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/6] nilfs2: implement tracking of live blocks
[not found] ` <3EC9549C-84A7-49B5-9BE1-34A7337BFFDC-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-03-16 11:36 ` Andreas Rohner
0 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 11:36 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs
On 2014-03-16 13:34, Vyacheslav Dubeyko wrote:
>
> On Mar 16, 2014, at 2:01 PM, Andreas Rohner wrote:
>
>> On 2014-03-16 11:47, Andreas Rohner wrote:
>>> Hi,
>>>
>>> This patch set implements the tracking of live blocks in segments. This
>>> information is crucial in implementing better GC policies, because
>>> now the policies can make informed decisions about which segments have
>>> the biggest number of reclaimable blocks.
>>
>> IMPORTANT:
>> I forgot to mention, that the patches are based on linux-next/master,
>> because they rely on previous patches that aren't in master yet.
>>
>
> As far as I can see, some guys mention about it via [PATCH -next 0/6], for example. :)
Thanks! I didn't know that. I will keep it in mind for next time.
br,
Andreas Rohner
> With the best regards,
> Vyacheslav Dubeyko.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <2FD47FE0-3468-4EF4-AAAE-4A636C640C44-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-03-16 12:24 ` Andreas Rohner
[not found] ` <53259801.5080409-hi6Y0CQ0nG0@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 12:24 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-16 14:00, Vyacheslav Dubeyko wrote:
>
> On Mar 16, 2014, at 1:47 PM, Andreas Rohner wrote:
>
>> This patch adds an additional timestamp to the segment usage
>> information that indicates the last time the usage information was
>> changed. So su_lastmod indicates the last time the segment itself was
>> modified and su_lastdec indicates the last time the usage information
>> itself was changed.
>>
>
> What will we have if user changes time?
> What sequence will we have after such "malicious" action?
> Did you test such situation?
The timestamp is just a hint for the userspace GC. If the hint is wrong
the result would be that the GC is less efficient for a while. After a
while it would go back to normal. You have the same problem with the
already existing su_lastmod timestamp.
>> This is important information for the GC, because it needs to avoid
>> selecting segments for cleaning that are created (su_lastmod) outside of
>> the protection period, but the blocks got reclaimable (su_nblocks is
>> decremented) within the protection period. Without that information the
>> GC policy has to assume, that there are reclaimble blocks, only to find
>> out, that they are protected by the protection period.
>>
>> This patch also introduces nilfs_sufile_add_segment_usage(), which can
>> be used to increment or decrement the value of su_nblocks of a specific
>> segment.
>>
>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>> ---
>> fs/nilfs2/sufile.c | 86 +++++++++++++++++++++++++++++++++++++++++++++--
>> fs/nilfs2/sufile.h | 18 ++++++++++
>> include/linux/nilfs2_fs.h | 7 ++++
>> 3 files changed, 109 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
>> index 2a869c3..0886938 100644
>> --- a/fs/nilfs2/sufile.c
>> +++ b/fs/nilfs2/sufile.c
>> @@ -453,6 +453,8 @@ void nilfs_sufile_do_scrap(struct inode *sufile, __u64 segnum,
>> su->su_lastmod = cpu_to_le64(0);
>> su->su_nblocks = cpu_to_le32(0);
>> su->su_flags = cpu_to_le32(1UL << NILFS_SEGMENT_USAGE_DIRTY);
>> + if (nilfs_sufile_lastdec_supported(sufile))
>> + su->su_lastdec = cpu_to_le64(0);
>> kunmap_atomic(kaddr);
>>
>> nilfs_sufile_mod_counter(header_bh, clean ? (u64)-1 : 0, dirty ? 0 : 1);
>> @@ -482,7 +484,7 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
>> WARN_ON(!nilfs_segment_usage_dirty(su));
>>
>> sudirty = nilfs_segment_usage_dirty(su);
>> - nilfs_segment_usage_set_clean(su);
>> + nilfs_sufile_segment_usage_set_clean(sufile, su);
>> kunmap_atomic(kaddr);
>> mark_buffer_dirty(su_bh);
>>
>> @@ -549,6 +551,75 @@ int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
>> }
>>
>> /**
>> + * nilfs_sufile_add_segment_usage - decrement usage of a segment
>
> I feel cultural dissonance about this name. Add or decrement? :)
> Decrement and add are different operations for me.
Yes that description is wrong. Thanks for pointing that out. The long
description below is correct though. By adding a signed value, one can
decrement and increment.
>> + * @sufile: inode of segment usage file
>> + * @segnum: segment number
>> + * @value: value to add to su_nblocks
>> + * @dectime: current time
>> + *
>> + * Description: nilfs_sufile_add_segment_usage() adds a signed value to the
>> + * su_nblocks field of the segment usage information of @segnum. It ensures
>> + * that the result is bigger than 0 and smaller or equal to the maximum number
>> + * of blocks per segment
>> + *
>> + * Return Value: On success, 0 is returned. On error, one of the following
>> + * negative error codes is returned.
>> + *
>> + * %-ENOMEM - Insufficient memory available.
>> + *
>> + * %-EIO - I/O error
>> + *
>> + * %-ENOENT - the specified block does not exist (hole block)
>> + */
>> +int nilfs_sufile_add_segment_usage(struct inode *sufile, __u64 segnum,
>> + __s64 value, time_t dectime)
>> +{
>> + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info;
>> + struct buffer_head *bh;
>> + struct nilfs_segment_usage *su;
>> + void *kaddr;
>> + int ret;
>> +
>> + if (value == 0)
>> + return 0;
>> +
>> + down_write(&NILFS_MDT(sufile)->mi_sem);
>> +
>> + ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, &bh);
>> + if (ret < 0)
>
> Maybe it needs to use unlikely() here.
Yes good idea.
>> + goto out_sem;
>> +
>> + kaddr = kmap_atomic(bh->b_page);
>> + su = nilfs_sufile_block_get_segment_usage(sufile, segnum, bh, kaddr);
>> + WARN_ON(nilfs_segment_usage_error(su));
>> +
>> + value += le32_to_cpu(su->su_nblocks);
>
> Decrement. Really? :)
>
>> + if (value < 0)
>> + value = 0;
>> + if (value > nilfs->ns_blocks_per_segment)
>
> maybe "else if" here?
Yes.
>> + value = nilfs->ns_blocks_per_segment;
>> +
>> + if (value == le32_to_cpu(su->su_nblocks)) {
>> + kunmap_atomic(kaddr);
>> + goto out_brelse;
>> + }
>> +
>> + su->su_nblocks = cpu_to_le32(value);
>> + if (dectime && nilfs_sufile_lastdec_supported(sufile))
>> + su->su_lastdec = cpu_to_le64(dectime);
>> + kunmap_atomic(kaddr);
>> +
>> + mark_buffer_dirty(bh);
>> + nilfs_mdt_mark_dirty(sufile);
>> +
>> +out_brelse:
>> + brelse(bh);
>> +out_sem:
>> + up_write(&NILFS_MDT(sufile)->mi_sem);
>> + return ret;
>> +}
>> +
>> +/**
>> * nilfs_sufile_get_stat - get segment usage statistics
>> * @sufile: inode of segment usage file
>> * @stat: pointer to a structure of segment usage statistics
>> @@ -698,7 +769,8 @@ static int nilfs_sufile_truncate_range(struct inode *sufile,
>> nc = 0;
>> for (su = su2, j = 0; j < n; j++, su = (void *)su + susz) {
>> if (nilfs_segment_usage_error(su)) {
>> - nilfs_segment_usage_set_clean(su);
>> + nilfs_sufile_segment_usage_set_clean(sufile,
>> + su);
>> nc++;
>> }
>> }
>> @@ -858,6 +930,13 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf,
>> if (nilfs_segment_is_active(nilfs, segnum + j))
>> si->sui_flags |=
>> (1UL << NILFS_SEGMENT_USAGE_ACTIVE);
>> + if (sisz >= sizeof(struct nilfs_suinfo)) {
>> + if (susz >= sizeof(struct nilfs_segment_usage))
>> + si->sui_lastdec =
>> + le64_to_cpu(su->su_lastdec);
>
> Is it really impossible to place assignment on one line?
Yes because it is already indented so much. I couldn't find an easy way
to get rid of the indentation.
>> + else
>> + si->sui_lastdec = 0;
>> + }
>> }
>> kunmap_atomic(kaddr);
>> brelse(su_bh);
>> @@ -935,6 +1014,9 @@ ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
>> if (nilfs_suinfo_update_lastmod(sup))
>> su->su_lastmod = cpu_to_le64(sup->sup_sui.sui_lastmod);
>>
>> + if (nilfs_suinfo_update_lastdec(sup))
>> + su->su_lastdec = cpu_to_le64(sup->sup_sui.sui_lastdec);
>> +
>> if (nilfs_suinfo_update_nblocks(sup))
>> su->su_nblocks = cpu_to_le32(sup->sup_sui.sui_nblocks);
>>
>> diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
>> index b8afd72..e5455d2 100644
>> --- a/fs/nilfs2/sufile.h
>> +++ b/fs/nilfs2/sufile.h
>> @@ -28,6 +28,23 @@
>> #include <linux/nilfs2_fs.h>
>> #include "mdt.h"
>>
>> +static inline int
>> +nilfs_sufile_lastdec_supported(const struct inode *sufile)
>> +{
>> + return NILFS_MDT(sufile)->mi_entry_size ==
>> + sizeof(struct nilfs_segment_usage);
>> +}
>> +
>> +static inline void
>> +nilfs_sufile_segment_usage_set_clean(const struct inode *sufile,
>> + struct nilfs_segment_usage *su)
>> +{
>> + su->su_lastmod = cpu_to_le64(0);
>> + su->su_nblocks = cpu_to_le32(0);
>> + su->su_flags = cpu_to_le32(0);
>> + if (nilfs_sufile_lastdec_supported(sufile))
>> + su->su_lastdec = cpu_to_le64(0);
>> +}
>>
>> static inline unsigned long nilfs_sufile_get_nsegments(struct inode *sufile)
>> {
>> @@ -41,6 +58,7 @@ int nilfs_sufile_alloc(struct inode *, __u64 *);
>> int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
>> int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
>> unsigned long nblocks, time_t modtime);
>> +int nilfs_sufile_add_segment_usage(struct inode *, __u64, __s64, time_t);
>> int nilfs_sufile_get_stat(struct inode *, struct nilfs_sustat *);
>> ssize_t nilfs_sufile_get_suinfo(struct inode *, __u64, void *, unsigned,
>> size_t);
>> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
>> index ff3fea3..ca269ad 100644
>> --- a/include/linux/nilfs2_fs.h
>> +++ b/include/linux/nilfs2_fs.h
>> @@ -614,11 +614,13 @@ struct nilfs_cpfile_header {
>> * @su_lastmod: last modified timestamp
>> * @su_nblocks: number of blocks in segment
>> * @su_flags: flags
>> + * @su_lastdec: last decrement of su_nblocks timestamp
>> */
>> struct nilfs_segment_usage {
>> __le64 su_lastmod;
>> __le32 su_nblocks;
>> __le32 su_flags;
>> + __le64 su_lastdec;
>
> So, this change makes on-disk layout incompatible with previous one.
> Am I correct? At first it needs to be fully confident that we really need in
> changing in this place. Secondly, it needs to add incompatible flag for
> s_feature_incompat field of superblock and maybe mount option.
No it IS compatible. NILFS uses the entry sizes stored in the super
block. Notice, that the code does not depend on sizeof(struct
nilfs_suinfo) or sizeof(struct nilfs_segment_usage). So an old kernel
can read a file system with su_lastdec and a new kernel can read an old
file system without su_lastdec.
> The su_lastdec sounds not very good for my taste.
Hmm, yes I agree. It is a remnant of a previous version of my code. I
will think of something better.
Thanks for your review.
br,
Andreas Rohner
> Thanks,
> Vyacheslav Dubeyko.
>
>> };
>>
>> #define NILFS_MIN_SEGMENT_USAGE_SIZE 16
>> @@ -663,6 +665,7 @@ nilfs_segment_usage_set_clean(struct nilfs_segment_usage *su)
>> su->su_lastmod = cpu_to_le64(0);
>> su->su_nblocks = cpu_to_le32(0);
>> su->su_flags = cpu_to_le32(0);
>> + su->su_lastdec = cpu_to_le64(0);
>> }
>>
>> static inline int
>> @@ -694,11 +697,13 @@ struct nilfs_sufile_header {
>> * @sui_lastmod: timestamp of last modification
>> * @sui_nblocks: number of written blocks in segment
>> * @sui_flags: segment usage flags
>> + * @sui_lastdec: last decrement of sui_nblocks timestamp
>> */
>> struct nilfs_suinfo {
>> __u64 sui_lastmod;
>> __u32 sui_nblocks;
>> __u32 sui_flags;
>> + __u64 sui_lastdec;
>> };
>>
>> #define NILFS_SUINFO_FNS(flag, name) \
>> @@ -736,6 +741,7 @@ enum {
>> NILFS_SUINFO_UPDATE_LASTMOD,
>> NILFS_SUINFO_UPDATE_NBLOCKS,
>> NILFS_SUINFO_UPDATE_FLAGS,
>> + NILFS_SUINFO_UPDATE_LASTDEC,
>> __NR_NILFS_SUINFO_UPDATE_FIELDS,
>> };
>>
>> @@ -759,6 +765,7 @@ nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup) \
>> NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
>> NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
>> NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
>> +NILFS_SUINFO_UPDATE_FNS(LASTDEC, lastdec)
>>
>> enum {
>> NILFS_CHECKPOINT,
>> --
>> 1.9.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/6] nilfs2: implement tracking of live blocks
[not found] ` <532584A2.8000004-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-16 12:34 ` Vyacheslav Dubeyko
[not found] ` <3EC9549C-84A7-49B5-9BE1-34A7337BFFDC-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-16 12:34 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs
On Mar 16, 2014, at 2:01 PM, Andreas Rohner wrote:
> On 2014-03-16 11:47, Andreas Rohner wrote:
>> Hi,
>>
>> This patch set implements the tracking of live blocks in segments. This
>> information is crucial in implementing better GC policies, because
>> now the policies can make informed decisions about which segments have
>> the biggest number of reclaimable blocks.
>
> IMPORTANT:
> I forgot to mention, that the patches are based on linux-next/master,
> because they rely on previous patches that aren't in master yet.
>
As far as I can see, some guys mention about it via [PATCH -next 0/6], for example. :)
With the best regards,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/4] nilfs-utils: add cost-benefit and greedy policies
[not found] ` <cc43be2e6bba5367fd2982dc0df5255b884bdace.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-16 12:55 ` Ryusuke Konishi
[not found] ` <20140316.215545.291456562.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Ryusuke Konishi @ 2014-03-16 12:55 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Sun, 16 Mar 2014 11:49:16 +0100, Andreas Rohner wrote:
> This patch implements the cost-benefit and greedy GC policies. These are
> well known policies for log-structured file systems [1].
>
> * Greedy:
> Select the segments with the most free space.
> * Cost-Benefit:
> Perform a cost-benefit analysis, whereby the free space gained is
> weighed against the cost of collecting the segment.
>
> Since especially cost-benefit needed more information than was available
> in nilfs_suinfo, a few extra parameters were added to the policy
> callback function prototype. The policy threshold was removed, since it
> served no real purpose. The flag p_comparison was added to indicate how
> the importance values should be interpreted. For example for the
> timestamp policy smaller values mean older timestamps, which is better.
> For greedy and cost-benefit on the other hand higher values are better.
> nilfs_cleanerd_select_segments() was updated accordingly.
>
> [1] Mendel Rosenblum and John K. Ousterhout. The design and implementa-
> tion of a log-structured file system. ACM Trans. Comput. Syst.,
> 10(1):26–52, February 1992.
>
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> ---
> include/nilfs2_fs.h | 9 ++++-
> sbin/cleanerd/cldconfig.c | 100 +++++++++++++++++++++++++++++++++++++++++++---
> sbin/cleanerd/cldconfig.h | 18 +++++----
> sbin/cleanerd/cleanerd.c | 56 ++++++++++++++++----------
> 4 files changed, 149 insertions(+), 34 deletions(-)
>
> diff --git a/include/nilfs2_fs.h b/include/nilfs2_fs.h
> index a16ad4c..967c2af 100644
> --- a/include/nilfs2_fs.h
> +++ b/include/nilfs2_fs.h
> @@ -483,7 +483,7 @@ struct nilfs_dat_entry {
> __le64 de_blocknr;
> __le64 de_start;
> __le64 de_end;
> - __le64 de_rsv;
> + __le64 de_ss;
> };
>
> /**
> @@ -612,11 +612,13 @@ struct nilfs_cpfile_header {
> * @su_lastmod: last modified timestamp
> * @su_nblocks: number of blocks in segment
> * @su_flags: flags
> + * @su_lastdec: last decrement of su_nblocks timestamp
> */
> struct nilfs_segment_usage {
> __le64 su_lastmod;
> __le32 su_nblocks;
> __le32 su_flags;
> + __le64 su_lastdec;
> };
>
> /* segment usage flag */
> @@ -659,6 +661,7 @@ nilfs_segment_usage_set_clean(struct nilfs_segment_usage *su)
> su->su_lastmod = cpu_to_le64(0);
> su->su_nblocks = cpu_to_le32(0);
> su->su_flags = cpu_to_le32(0);
> + su->su_lastdec = cpu_to_le64(0);
> }
>
> static inline int
> @@ -690,11 +693,13 @@ struct nilfs_sufile_header {
> * @sui_lastmod: timestamp of last modification
> * @sui_nblocks: number of written blocks in segment
> * @sui_flags: segment usage flags
> + * @sui_lastdec: last decrement of sui_nblocks timestamp
> */
> struct nilfs_suinfo {
> __u64 sui_lastmod;
> __u32 sui_nblocks;
> __u32 sui_flags;
> + __u64 sui_lastdec;
> };
>
> #define NILFS_SUINFO_FNS(flag, name) \
> @@ -732,6 +737,7 @@ enum {
> NILFS_SUINFO_UPDATE_LASTMOD,
> NILFS_SUINFO_UPDATE_NBLOCKS,
> NILFS_SUINFO_UPDATE_FLAGS,
> + NILFS_SUINFO_UPDATE_LASTDEC,
> __NR_NILFS_SUINFO_UPDATE_FIELDS,
> };
>
> @@ -755,6 +761,7 @@ nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup) \
> NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
> NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
> NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
> +NILFS_SUINFO_UPDATE_FNS(LASTDEC, lastdec)
>
> enum {
> NILFS_CHECKPOINT,
> diff --git a/sbin/cleanerd/cldconfig.c b/sbin/cleanerd/cldconfig.c
> index c8b197b..ade974a 100644
> --- a/sbin/cleanerd/cldconfig.c
> +++ b/sbin/cleanerd/cldconfig.c
> @@ -380,7 +380,10 @@ nilfs_cldconfig_handle_clean_check_interval(struct nilfs_cldconfig *config,
> }
>
> static unsigned long long
> -nilfs_cldconfig_selection_policy_timestamp(const struct nilfs_suinfo *si)
> +nilfs_cldconfig_selection_policy_timestamp(struct nilfs *nilfs,
> + const struct nilfs_sustat *sustat,
> + const struct nilfs_suinfo *si,
> + __u64 prottime)
> {
> return si->sui_lastmod;
> }
> @@ -391,14 +394,101 @@ nilfs_cldconfig_handle_selection_policy_timestamp(struct nilfs_cldconfig *config
> {
> config->cf_selection_policy.p_importance =
> NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE;
> - config->cf_selection_policy.p_threshold =
> - NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD;
> + config->cf_selection_policy.p_comparison =
> + NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER;
> + return 0;
> +}
> +
> +static unsigned long long
> +nilfs_cldconfig_selection_policy_greedy(struct nilfs *nilfs,
> + const struct nilfs_sustat *sustat,
> + const struct nilfs_suinfo *si,
> + __u64 prottime)
> +{
> + __u32 value, max_blocks = nilfs_get_blocks_per_segment(nilfs);
> +
> + if (max_blocks < si->sui_nblocks)
> + return 0;
> +
> + value = max_blocks - si->sui_nblocks;
> +
> + /*
> + * the value of sui_nblocks is probably not accurate
> + * because blocks inside the protection period are not
> + * considered to be dead
> + */
> + if (si->sui_lastdec >= prottime)
> + value >>= 4;
> +
> + return value;
> +}
> +
> +static int
> +nilfs_cldconfig_handle_selection_policy_greedy(struct nilfs_cldconfig *config,
> + char **tokens, size_t ntoks)
> +{
> + config->cf_selection_policy.p_importance =
> + nilfs_cldconfig_selection_policy_greedy;
> + config->cf_selection_policy.p_comparison =
> + NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER;
> + return 0;
> +}
> +
> +static unsigned long long
> +nilfs_cldconfig_selection_policy_cost_benefit(struct nilfs *nilfs,
> + const struct nilfs_sustat *sustat,
> + const struct nilfs_suinfo *si,
> + __u64 prottime)
> +{
> + __u32 free_blocks, cleaning_cost;
> + unsigned long long value, age;
> +
> + free_blocks = nilfs_get_blocks_per_segment(nilfs) - si->sui_nblocks;
> + /* read the whole segment + write the live blocks */
> + cleaning_cost = 2 * si->sui_nblocks;
> + /*
> + * multiply by 1000 to convert age to milliseconds
> + * (higher precision for division)
> + */
> + age = (sustat->ss_nongc_ctime - si->sui_lastmod) * 1000;
> +
> + if (sustat->ss_nongc_ctime < si->sui_lastmod)
> + return 0;
> +
> + if (cleaning_cost == 0)
> + cleaning_cost = 1;
> +
> +
> + value = (age * free_blocks) / cleaning_cost;
> +
> + /*
> + * the value of sui_nblocks is probably not accurate
> + * because blocks inside the protection period are not
> + * considered to be dead
> + */
> + if (si->sui_lastdec >= prottime)
> + value >>= 4;
> +
> + return value;
> +}
> +
> +static int
> +nilfs_cldconfig_handle_selection_policy_cost_benefit(
> + struct nilfs_cldconfig *config,
> + char **tokens, size_t ntoks)
> +{
> + config->cf_selection_policy.p_importance =
> + nilfs_cldconfig_selection_policy_cost_benefit;
> + config->cf_selection_policy.p_comparison =
> + NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER;
> return 0;
> }
>
> static const struct nilfs_cldconfig_polhandle
> nilfs_cldconfig_polhandle_table[] = {
> {"timestamp", nilfs_cldconfig_handle_selection_policy_timestamp},
> + {"greedy", nilfs_cldconfig_handle_selection_policy_greedy},
> + {"cost-benefit", nilfs_cldconfig_handle_selection_policy_cost_benefit},
> };
>
> #define NILFS_CLDCONFIG_NPOLHANDLES \
> @@ -688,8 +778,8 @@ static void nilfs_cldconfig_set_default(struct nilfs_cldconfig *config,
>
> config->cf_selection_policy.p_importance =
> NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE;
> - config->cf_selection_policy.p_threshold =
> - NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD;
> + config->cf_selection_policy.p_comparison =
> + NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER;
> config->cf_protection_period.tv_sec = NILFS_CLDCONFIG_PROTECTION_PERIOD;
> config->cf_protection_period.tv_usec = 0;
>
> diff --git a/sbin/cleanerd/cldconfig.h b/sbin/cleanerd/cldconfig.h
> index 0a598d5..95d2fde 100644
> --- a/sbin/cleanerd/cldconfig.h
> +++ b/sbin/cleanerd/cldconfig.h
> @@ -30,16 +30,21 @@
> #include <sys/time.h>
> #include <syslog.h>
>
> +struct nilfs;
> +struct nilfs_sustat;
> struct nilfs_suinfo;
>
> /**
> * struct nilfs_selection_policy -
> - * @p_importance:
> - * @p_threshold:
> + * @p_importance: function to calculate the importance for the policy
> + * @p_comparison: flag that indicates how to sort the importance
> */
> struct nilfs_selection_policy {
> - unsigned long long (*p_importance)(const struct nilfs_suinfo *);
> - unsigned long long p_threshold;
> + unsigned long long (*p_importance)(struct nilfs *nilfs,
> + const struct nilfs_sustat *sustat,
> + const struct nilfs_suinfo *,
> + __u64 prottime);
> + int p_comparison;
> };
>
> /**
> @@ -113,7 +118,8 @@ struct nilfs_cldconfig {
>
> #define NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE \
> nilfs_cldconfig_selection_policy_timestamp
> -#define NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD 0
> +#define NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER 0
> +#define NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER 1
> #define NILFS_CLDCONFIG_PROTECTION_PERIOD 3600
> #define NILFS_CLDCONFIG_MIN_CLEAN_SEGMENTS 10
> #define NILFS_CLDCONFIG_MIN_CLEAN_SEGMENTS_UNIT NILFS_SIZE_UNIT_PERCENT
> @@ -135,8 +141,6 @@ struct nilfs_cldconfig {
>
> #define NILFS_CLDCONFIG_NSEGMENTS_PER_CLEAN_MAX 32
>
> -struct nilfs;
> -
> int nilfs_cldconfig_read(struct nilfs_cldconfig *config, const char *path,
> struct nilfs *nilfs);
>
> diff --git a/sbin/cleanerd/cleanerd.c b/sbin/cleanerd/cleanerd.c
> index 17de87b..8df3a07 100644
> --- a/sbin/cleanerd/cleanerd.c
> +++ b/sbin/cleanerd/cleanerd.c
> @@ -417,7 +417,7 @@ static void nilfs_cleanerd_destroy(struct nilfs_cleanerd *cleanerd)
> free(cleanerd);
> }
>
> -static int nilfs_comp_segimp(const void *elem1, const void *elem2)
> +static int nilfs_comp_segimp_asc(const void *elem1, const void *elem2)
> {
> const struct nilfs_segimp *segimp1 = elem1, *segimp2 = elem2;
>
> @@ -429,6 +429,18 @@ static int nilfs_comp_segimp(const void *elem1, const void *elem2)
> return (segimp1->si_segnum < segimp2->si_segnum) ? -1 : 1;
> }
>
> +static int nilfs_comp_segimp_desc(const void *elem1, const void *elem2)
> +{
> + const struct nilfs_segimp *segimp1 = elem1, *segimp2 = elem2;
> +
> + if (segimp1->si_importance > segimp2->si_importance)
> + return -1;
> + else if (segimp1->si_importance < segimp2->si_importance)
> + return 1;
> +
> + return (segimp1->si_segnum < segimp2->si_segnum) ? -1 : 1;
> +}
> +
> static int nilfs_cleanerd_automatic_suspend(struct nilfs_cleanerd *cleanerd)
> {
> return cleanerd->config.cf_min_clean_segments > 0;
> @@ -579,7 +591,7 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
> __u64 segnum;
> size_t count, nsegs;
> ssize_t nssegs, n;
> - unsigned long long imp, thr;
> + unsigned long long imp;
> int i;
>
> nsegs = nilfs_cleanerd_ncleansegs(cleanerd);
> @@ -600,11 +612,8 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
> prottime = tv2.tv_sec;
> oldest = tv.tv_sec;
>
> - /* The segments that have larger importance than thr are not
> - * selected. */
> - thr = (config->cf_selection_policy.p_threshold != 0) ?
> - config->cf_selection_policy.p_threshold :
> - sustat->ss_nongc_ctime;
> + /* sui_lastdec may not be set by nilfs_get_suinfo */
> + memset(si, 0, sizeof(si));
>
> for (segnum = 0; segnum < sustat->ss_nsegs; segnum += n) {
> count = min_t(__u64, sustat->ss_nsegs - segnum,
> @@ -615,22 +624,23 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
> goto out;
> }
> for (i = 0; i < n; i++) {
> - if (!nilfs_suinfo_reclaimable(&si[i]))
> + if (!nilfs_suinfo_reclaimable(&si[i]) ||
> + si[i].sui_lastmod >= sustat->ss_nongc_ctime)
> continue;
>
> - imp = config->cf_selection_policy.p_importance(&si[i]);
> - if (imp < thr) {
> - if (si[i].sui_lastmod < oldest)
> - oldest = si[i].sui_lastmod;
> - if (si[i].sui_lastmod < prottime) {
> - sm = nilfs_vector_get_new_element(smv);
> - if (sm == NULL) {
> - nssegs = -1;
> - goto out;
> - }
> - sm->si_segnum = segnum + i;
> - sm->si_importance = imp;
> + imp = config->cf_selection_policy.p_importance(nilfs,
> + sustat, &si[i], prottime);
> +
> + if (si[i].sui_lastmod < oldest)
> + oldest = si[i].sui_lastmod;
> + if (si[i].sui_lastmod < prottime) {
> + sm = nilfs_vector_get_new_element(smv);
> + if (sm == NULL) {
> + nssegs = -1;
> + goto out;
> }
> + sm->si_segnum = segnum + i;
> + sm->si_importance = imp;
> }
> }
> if (n == 0) {
> @@ -642,7 +652,11 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
> break;
> }
> }
> - nilfs_vector_sort(smv, nilfs_comp_segimp);
> + if (config->cf_selection_policy.p_comparison ==
> + NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER)
> + nilfs_vector_sort(smv, nilfs_comp_segimp_asc);
> + else
> + nilfs_vector_sort(smv, nilfs_comp_segimp_desc);
>
> nssegs = (nilfs_vector_get_size(smv) < nsegs) ?
> nilfs_vector_get_size(smv) : nsegs;
> --
> 1.9.0
scripts/checkpatch.pl detected the following coding style issues:
ERROR: code indent should use tabs where possible
#171: FILE: sbin/cleanerd/cldconfig.c:404:
+^I^I^I^I const struct nilfs_sustat *sustat,$
ERROR: code indent should use tabs where possible
#172: FILE: sbin/cleanerd/cldconfig.c:405:
+^I^I^I^I const struct nilfs_suinfo *si,$
ERROR: code indent should use tabs where possible
#173: FILE: sbin/cleanerd/cldconfig.c:406:
+^I^I^I^I __u64 prottime)$
Please mind it next time. (You don't have to resubmit the whole series
now for this).
I would like to first understand this series, but I am very busy
recently. (Also, I am still pending review of Vycheslav's xattr
patchset.) So, let me go forward a little bit at a time.
Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <12561ce5e2cf8ae07fdda05e16c357f37d17c62f.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-16 13:00 ` Vyacheslav Dubeyko
[not found] ` <2FD47FE0-3468-4EF4-AAAE-4A636C640C44-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-16 13:00 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Mar 16, 2014, at 1:47 PM, Andreas Rohner wrote:
> This patch adds an additional timestamp to the segment usage
> information that indicates the last time the usage information was
> changed. So su_lastmod indicates the last time the segment itself was
> modified and su_lastdec indicates the last time the usage information
> itself was changed.
>
What will we have if user changes time?
What sequence will we have after such "malicious" action?
Did you test such situation?
> This is important information for the GC, because it needs to avoid
> selecting segments for cleaning that are created (su_lastmod) outside of
> the protection period, but the blocks got reclaimable (su_nblocks is
> decremented) within the protection period. Without that information the
> GC policy has to assume, that there are reclaimble blocks, only to find
> out, that they are protected by the protection period.
>
> This patch also introduces nilfs_sufile_add_segment_usage(), which can
> be used to increment or decrement the value of su_nblocks of a specific
> segment.
>
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> ---
> fs/nilfs2/sufile.c | 86 +++++++++++++++++++++++++++++++++++++++++++++--
> fs/nilfs2/sufile.h | 18 ++++++++++
> include/linux/nilfs2_fs.h | 7 ++++
> 3 files changed, 109 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> index 2a869c3..0886938 100644
> --- a/fs/nilfs2/sufile.c
> +++ b/fs/nilfs2/sufile.c
> @@ -453,6 +453,8 @@ void nilfs_sufile_do_scrap(struct inode *sufile, __u64 segnum,
> su->su_lastmod = cpu_to_le64(0);
> su->su_nblocks = cpu_to_le32(0);
> su->su_flags = cpu_to_le32(1UL << NILFS_SEGMENT_USAGE_DIRTY);
> + if (nilfs_sufile_lastdec_supported(sufile))
> + su->su_lastdec = cpu_to_le64(0);
> kunmap_atomic(kaddr);
>
> nilfs_sufile_mod_counter(header_bh, clean ? (u64)-1 : 0, dirty ? 0 : 1);
> @@ -482,7 +484,7 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
> WARN_ON(!nilfs_segment_usage_dirty(su));
>
> sudirty = nilfs_segment_usage_dirty(su);
> - nilfs_segment_usage_set_clean(su);
> + nilfs_sufile_segment_usage_set_clean(sufile, su);
> kunmap_atomic(kaddr);
> mark_buffer_dirty(su_bh);
>
> @@ -549,6 +551,75 @@ int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
> }
>
> /**
> + * nilfs_sufile_add_segment_usage - decrement usage of a segment
I feel cultural dissonance about this name. Add or decrement? :)
Decrement and add are different operations for me.
> + * @sufile: inode of segment usage file
> + * @segnum: segment number
> + * @value: value to add to su_nblocks
> + * @dectime: current time
> + *
> + * Description: nilfs_sufile_add_segment_usage() adds a signed value to the
> + * su_nblocks field of the segment usage information of @segnum. It ensures
> + * that the result is bigger than 0 and smaller or equal to the maximum number
> + * of blocks per segment
> + *
> + * Return Value: On success, 0 is returned. On error, one of the following
> + * negative error codes is returned.
> + *
> + * %-ENOMEM - Insufficient memory available.
> + *
> + * %-EIO - I/O error
> + *
> + * %-ENOENT - the specified block does not exist (hole block)
> + */
> +int nilfs_sufile_add_segment_usage(struct inode *sufile, __u64 segnum,
> + __s64 value, time_t dectime)
> +{
> + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info;
> + struct buffer_head *bh;
> + struct nilfs_segment_usage *su;
> + void *kaddr;
> + int ret;
> +
> + if (value == 0)
> + return 0;
> +
> + down_write(&NILFS_MDT(sufile)->mi_sem);
> +
> + ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, &bh);
> + if (ret < 0)
Maybe it needs to use unlikely() here.
> + goto out_sem;
> +
> + kaddr = kmap_atomic(bh->b_page);
> + su = nilfs_sufile_block_get_segment_usage(sufile, segnum, bh, kaddr);
> + WARN_ON(nilfs_segment_usage_error(su));
> +
> + value += le32_to_cpu(su->su_nblocks);
Decrement. Really? :)
> + if (value < 0)
> + value = 0;
> + if (value > nilfs->ns_blocks_per_segment)
maybe "else if" here?
> + value = nilfs->ns_blocks_per_segment;
> +
> + if (value == le32_to_cpu(su->su_nblocks)) {
> + kunmap_atomic(kaddr);
> + goto out_brelse;
> + }
> +
> + su->su_nblocks = cpu_to_le32(value);
> + if (dectime && nilfs_sufile_lastdec_supported(sufile))
> + su->su_lastdec = cpu_to_le64(dectime);
> + kunmap_atomic(kaddr);
> +
> + mark_buffer_dirty(bh);
> + nilfs_mdt_mark_dirty(sufile);
> +
> +out_brelse:
> + brelse(bh);
> +out_sem:
> + up_write(&NILFS_MDT(sufile)->mi_sem);
> + return ret;
> +}
> +
> +/**
> * nilfs_sufile_get_stat - get segment usage statistics
> * @sufile: inode of segment usage file
> * @stat: pointer to a structure of segment usage statistics
> @@ -698,7 +769,8 @@ static int nilfs_sufile_truncate_range(struct inode *sufile,
> nc = 0;
> for (su = su2, j = 0; j < n; j++, su = (void *)su + susz) {
> if (nilfs_segment_usage_error(su)) {
> - nilfs_segment_usage_set_clean(su);
> + nilfs_sufile_segment_usage_set_clean(sufile,
> + su);
> nc++;
> }
> }
> @@ -858,6 +930,13 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf,
> if (nilfs_segment_is_active(nilfs, segnum + j))
> si->sui_flags |=
> (1UL << NILFS_SEGMENT_USAGE_ACTIVE);
> + if (sisz >= sizeof(struct nilfs_suinfo)) {
> + if (susz >= sizeof(struct nilfs_segment_usage))
> + si->sui_lastdec =
> + le64_to_cpu(su->su_lastdec);
Is it really impossible to place assignment on one line?
> + else
> + si->sui_lastdec = 0;
> + }
> }
> kunmap_atomic(kaddr);
> brelse(su_bh);
> @@ -935,6 +1014,9 @@ ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
> if (nilfs_suinfo_update_lastmod(sup))
> su->su_lastmod = cpu_to_le64(sup->sup_sui.sui_lastmod);
>
> + if (nilfs_suinfo_update_lastdec(sup))
> + su->su_lastdec = cpu_to_le64(sup->sup_sui.sui_lastdec);
> +
> if (nilfs_suinfo_update_nblocks(sup))
> su->su_nblocks = cpu_to_le32(sup->sup_sui.sui_nblocks);
>
> diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
> index b8afd72..e5455d2 100644
> --- a/fs/nilfs2/sufile.h
> +++ b/fs/nilfs2/sufile.h
> @@ -28,6 +28,23 @@
> #include <linux/nilfs2_fs.h>
> #include "mdt.h"
>
> +static inline int
> +nilfs_sufile_lastdec_supported(const struct inode *sufile)
> +{
> + return NILFS_MDT(sufile)->mi_entry_size ==
> + sizeof(struct nilfs_segment_usage);
> +}
> +
> +static inline void
> +nilfs_sufile_segment_usage_set_clean(const struct inode *sufile,
> + struct nilfs_segment_usage *su)
> +{
> + su->su_lastmod = cpu_to_le64(0);
> + su->su_nblocks = cpu_to_le32(0);
> + su->su_flags = cpu_to_le32(0);
> + if (nilfs_sufile_lastdec_supported(sufile))
> + su->su_lastdec = cpu_to_le64(0);
> +}
>
> static inline unsigned long nilfs_sufile_get_nsegments(struct inode *sufile)
> {
> @@ -41,6 +58,7 @@ int nilfs_sufile_alloc(struct inode *, __u64 *);
> int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
> int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
> unsigned long nblocks, time_t modtime);
> +int nilfs_sufile_add_segment_usage(struct inode *, __u64, __s64, time_t);
> int nilfs_sufile_get_stat(struct inode *, struct nilfs_sustat *);
> ssize_t nilfs_sufile_get_suinfo(struct inode *, __u64, void *, unsigned,
> size_t);
> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
> index ff3fea3..ca269ad 100644
> --- a/include/linux/nilfs2_fs.h
> +++ b/include/linux/nilfs2_fs.h
> @@ -614,11 +614,13 @@ struct nilfs_cpfile_header {
> * @su_lastmod: last modified timestamp
> * @su_nblocks: number of blocks in segment
> * @su_flags: flags
> + * @su_lastdec: last decrement of su_nblocks timestamp
> */
> struct nilfs_segment_usage {
> __le64 su_lastmod;
> __le32 su_nblocks;
> __le32 su_flags;
> + __le64 su_lastdec;
So, this change makes on-disk layout incompatible with previous one.
Am I correct? At first it needs to be fully confident that we really need in
changing in this place. Secondly, it needs to add incompatible flag for
s_feature_incompat field of superblock and maybe mount option.
The su_lastdec sounds not very good for my taste.
Thanks,
Vyacheslav Dubeyko.
> };
>
> #define NILFS_MIN_SEGMENT_USAGE_SIZE 16
> @@ -663,6 +665,7 @@ nilfs_segment_usage_set_clean(struct nilfs_segment_usage *su)
> su->su_lastmod = cpu_to_le64(0);
> su->su_nblocks = cpu_to_le32(0);
> su->su_flags = cpu_to_le32(0);
> + su->su_lastdec = cpu_to_le64(0);
> }
>
> static inline int
> @@ -694,11 +697,13 @@ struct nilfs_sufile_header {
> * @sui_lastmod: timestamp of last modification
> * @sui_nblocks: number of written blocks in segment
> * @sui_flags: segment usage flags
> + * @sui_lastdec: last decrement of sui_nblocks timestamp
> */
> struct nilfs_suinfo {
> __u64 sui_lastmod;
> __u32 sui_nblocks;
> __u32 sui_flags;
> + __u64 sui_lastdec;
> };
>
> #define NILFS_SUINFO_FNS(flag, name) \
> @@ -736,6 +741,7 @@ enum {
> NILFS_SUINFO_UPDATE_LASTMOD,
> NILFS_SUINFO_UPDATE_NBLOCKS,
> NILFS_SUINFO_UPDATE_FLAGS,
> + NILFS_SUINFO_UPDATE_LASTDEC,
> __NR_NILFS_SUINFO_UPDATE_FIELDS,
> };
>
> @@ -759,6 +765,7 @@ nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup) \
> NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
> NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
> NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
> +NILFS_SUINFO_UPDATE_FNS(LASTDEC, lastdec)
>
> enum {
> NILFS_CHECKPOINT,
> --
> 1.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <ED41900C-6380-44C1-AC7E-EB8DF74EBFBD-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-03-16 13:31 ` Ryusuke Konishi
[not found] ` <20140316.223111.52181167.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Ryusuke Konishi @ 2014-03-16 13:31 UTC (permalink / raw)
To: Andreas Rohner; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Sun, 16 Mar 2014 17:06:10 +0300, Vyacheslav Dubeyko wrote:
>
> On Mar 16, 2014, at 3:24 PM, Andreas Rohner wrote:
>
>>>>
>>>> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
>>>> index ff3fea3..ca269ad 100644
>>>> --- a/include/linux/nilfs2_fs.h
>>>> +++ b/include/linux/nilfs2_fs.h
>>>> @@ -614,11 +614,13 @@ struct nilfs_cpfile_header {
>>>> * @su_lastmod: last modified timestamp
>>>> * @su_nblocks: number of blocks in segment
>>>> * @su_flags: flags
>>>> + * @su_lastdec: last decrement of su_nblocks timestamp
>>>> */
>>>> struct nilfs_segment_usage {
>>>> __le64 su_lastmod;
>>>> __le32 su_nblocks;
>>>> __le32 su_flags;
>>>> + __le64 su_lastdec;
>>>
>>> So, this change makes on-disk layout incompatible with previous one.
>>> Am I correct? At first it needs to be fully confident that we really need in
>>> changing in this place. Secondly, it needs to add incompatible flag for
>>> s_feature_incompat field of superblock and maybe mount option.
>>
>> No it IS compatible. NILFS uses the entry sizes stored in the super
>> block. Notice, that the code does not depend on sizeof(struct
>> nilfs_suinfo) or sizeof(struct nilfs_segment_usage). So an old kernel
>> can read a file system with su_lastdec and a new kernel can read an old
>> file system without su_lastdec.
>
> But, anyway, I think that you add some new feature by this and previous
> patches. I suppose that it makes sense to add specially dedicated flag or
> flags in s_feature_xxx field of superblock. If feature is compatible with
> previous state of driver then flag can be added for s_feature_compat
> field.
>
> Thanks,
> Vyacheslav Dubeyko.
This is important thing. Please evaluate backward compatibility and
forward compatibility of modifications, and properly add one of
incompat, compat_ro, or compat flags as Vyacheslav mentioned. It will
be a focal point of early stage review.
Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <53259801.5080409-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-16 13:34 ` Vyacheslav Dubeyko
[not found] ` <0ED0D5DA-9AE9-44B8-8936-1680DE2B64C5-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-03-16 14:06 ` Vyacheslav Dubeyko
1 sibling, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-16 13:34 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> On 16 марта 2014 г., at 16:24, Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> wrote:
>
>> On 2014-03-16 14:00, Vyacheslav Dubeyko wrote:
>>
>>> On Mar 16, 2014, at 1:47 PM, Andreas Rohner wrote:
>>>
>>> This patch adds an additional timestamp to the segment usage
>>> information that indicates the last time the usage information was
>>> changed. So su_lastmod indicates the last time the segment itself was
>>> modified and su_lastdec indicates the last time the usage information
>>> itself was changed.
>>
>> What will we have if user changes time?
>> What sequence will we have after such "malicious" action?
>> Did you test such situation?
>
> The timestamp is just a hint for the userspace GC. If the hint is wrong
> the result would be that the GC is less efficient for a while. After a
> while it would go back to normal. You have the same problem with the
> already existing su_lastmod timestamp.
>
But I worry about such thing. Previously, we had complains of users about
different issues with timestamp policy of GC. And I had hope that namely
new GC policies can resolve such GC disadvantage. So, what have we again?
The same issue of GC?
Thanks,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <53259801.5080409-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 13:34 ` Vyacheslav Dubeyko
@ 2014-03-16 14:06 ` Vyacheslav Dubeyko
[not found] ` <ED41900C-6380-44C1-AC7E-EB8DF74EBFBD-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
1 sibling, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-16 14:06 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Mar 16, 2014, at 3:24 PM, Andreas Rohner wrote:
>>>
>>> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
>>> index ff3fea3..ca269ad 100644
>>> --- a/include/linux/nilfs2_fs.h
>>> +++ b/include/linux/nilfs2_fs.h
>>> @@ -614,11 +614,13 @@ struct nilfs_cpfile_header {
>>> * @su_lastmod: last modified timestamp
>>> * @su_nblocks: number of blocks in segment
>>> * @su_flags: flags
>>> + * @su_lastdec: last decrement of su_nblocks timestamp
>>> */
>>> struct nilfs_segment_usage {
>>> __le64 su_lastmod;
>>> __le32 su_nblocks;
>>> __le32 su_flags;
>>> + __le64 su_lastdec;
>>
>> So, this change makes on-disk layout incompatible with previous one.
>> Am I correct? At first it needs to be fully confident that we really need in
>> changing in this place. Secondly, it needs to add incompatible flag for
>> s_feature_incompat field of superblock and maybe mount option.
>
> No it IS compatible. NILFS uses the entry sizes stored in the super
> block. Notice, that the code does not depend on sizeof(struct
> nilfs_suinfo) or sizeof(struct nilfs_segment_usage). So an old kernel
> can read a file system with su_lastdec and a new kernel can read an old
> file system without su_lastdec.
But, anyway, I think that you add some new feature by this and previous
patches. I suppose that it makes sense to add specially dedicated flag or
flags in s_feature_xxx field of superblock. If feature is compatible with
previous state of driver then flag can be added for s_feature_compat
field.
Thanks,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/4] nilfs-utils: add cost-benefit and greedy policies
[not found] ` <20140316.215545.291456562.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-03-16 15:50 ` Andreas Rohner
0 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 15:50 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-16 13:55, Ryusuke Konishi wrote:
> On Sun, 16 Mar 2014 11:49:16 +0100, Andreas Rohner wrote:
>> This patch implements the cost-benefit and greedy GC policies. These are
>> well known policies for log-structured file systems [1].
>>
>> * Greedy:
>> Select the segments with the most free space.
>> * Cost-Benefit:
>> Perform a cost-benefit analysis, whereby the free space gained is
>> weighed against the cost of collecting the segment.
>>
>> Since especially cost-benefit needed more information than was available
>> in nilfs_suinfo, a few extra parameters were added to the policy
>> callback function prototype. The policy threshold was removed, since it
>> served no real purpose. The flag p_comparison was added to indicate how
>> the importance values should be interpreted. For example for the
>> timestamp policy smaller values mean older timestamps, which is better.
>> For greedy and cost-benefit on the other hand higher values are better.
>> nilfs_cleanerd_select_segments() was updated accordingly.
>>
>> [1] Mendel Rosenblum and John K. Ousterhout. The design and implementa-
>> tion of a log-structured file system. ACM Trans. Comput. Syst.,
>> 10(1):26–52, February 1992.
>>
>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>> ---
>> include/nilfs2_fs.h | 9 ++++-
>> sbin/cleanerd/cldconfig.c | 100 +++++++++++++++++++++++++++++++++++++++++++---
>> sbin/cleanerd/cldconfig.h | 18 +++++----
>> sbin/cleanerd/cleanerd.c | 56 ++++++++++++++++----------
>> 4 files changed, 149 insertions(+), 34 deletions(-)
>>
>> diff --git a/include/nilfs2_fs.h b/include/nilfs2_fs.h
>> index a16ad4c..967c2af 100644
>> --- a/include/nilfs2_fs.h
>> +++ b/include/nilfs2_fs.h
>> @@ -483,7 +483,7 @@ struct nilfs_dat_entry {
>> __le64 de_blocknr;
>> __le64 de_start;
>> __le64 de_end;
>> - __le64 de_rsv;
>> + __le64 de_ss;
>> };
>>
>> /**
>> @@ -612,11 +612,13 @@ struct nilfs_cpfile_header {
>> * @su_lastmod: last modified timestamp
>> * @su_nblocks: number of blocks in segment
>> * @su_flags: flags
>> + * @su_lastdec: last decrement of su_nblocks timestamp
>> */
>> struct nilfs_segment_usage {
>> __le64 su_lastmod;
>> __le32 su_nblocks;
>> __le32 su_flags;
>> + __le64 su_lastdec;
>> };
>>
>> /* segment usage flag */
>> @@ -659,6 +661,7 @@ nilfs_segment_usage_set_clean(struct nilfs_segment_usage *su)
>> su->su_lastmod = cpu_to_le64(0);
>> su->su_nblocks = cpu_to_le32(0);
>> su->su_flags = cpu_to_le32(0);
>> + su->su_lastdec = cpu_to_le64(0);
>> }
>>
>> static inline int
>> @@ -690,11 +693,13 @@ struct nilfs_sufile_header {
>> * @sui_lastmod: timestamp of last modification
>> * @sui_nblocks: number of written blocks in segment
>> * @sui_flags: segment usage flags
>> + * @sui_lastdec: last decrement of sui_nblocks timestamp
>> */
>> struct nilfs_suinfo {
>> __u64 sui_lastmod;
>> __u32 sui_nblocks;
>> __u32 sui_flags;
>> + __u64 sui_lastdec;
>> };
>>
>> #define NILFS_SUINFO_FNS(flag, name) \
>> @@ -732,6 +737,7 @@ enum {
>> NILFS_SUINFO_UPDATE_LASTMOD,
>> NILFS_SUINFO_UPDATE_NBLOCKS,
>> NILFS_SUINFO_UPDATE_FLAGS,
>> + NILFS_SUINFO_UPDATE_LASTDEC,
>> __NR_NILFS_SUINFO_UPDATE_FIELDS,
>> };
>>
>> @@ -755,6 +761,7 @@ nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup) \
>> NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
>> NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
>> NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
>> +NILFS_SUINFO_UPDATE_FNS(LASTDEC, lastdec)
>>
>> enum {
>> NILFS_CHECKPOINT,
>> diff --git a/sbin/cleanerd/cldconfig.c b/sbin/cleanerd/cldconfig.c
>> index c8b197b..ade974a 100644
>> --- a/sbin/cleanerd/cldconfig.c
>> +++ b/sbin/cleanerd/cldconfig.c
>> @@ -380,7 +380,10 @@ nilfs_cldconfig_handle_clean_check_interval(struct nilfs_cldconfig *config,
>> }
>>
>> static unsigned long long
>> -nilfs_cldconfig_selection_policy_timestamp(const struct nilfs_suinfo *si)
>> +nilfs_cldconfig_selection_policy_timestamp(struct nilfs *nilfs,
>> + const struct nilfs_sustat *sustat,
>> + const struct nilfs_suinfo *si,
>> + __u64 prottime)
>> {
>> return si->sui_lastmod;
>> }
>> @@ -391,14 +394,101 @@ nilfs_cldconfig_handle_selection_policy_timestamp(struct nilfs_cldconfig *config
>> {
>> config->cf_selection_policy.p_importance =
>> NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE;
>> - config->cf_selection_policy.p_threshold =
>> - NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD;
>> + config->cf_selection_policy.p_comparison =
>> + NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER;
>> + return 0;
>> +}
>> +
>> +static unsigned long long
>> +nilfs_cldconfig_selection_policy_greedy(struct nilfs *nilfs,
>> + const struct nilfs_sustat *sustat,
>> + const struct nilfs_suinfo *si,
>> + __u64 prottime)
>> +{
>> + __u32 value, max_blocks = nilfs_get_blocks_per_segment(nilfs);
>> +
>> + if (max_blocks < si->sui_nblocks)
>> + return 0;
>> +
>> + value = max_blocks - si->sui_nblocks;
>> +
>> + /*
>> + * the value of sui_nblocks is probably not accurate
>> + * because blocks inside the protection period are not
>> + * considered to be dead
>> + */
>> + if (si->sui_lastdec >= prottime)
>> + value >>= 4;
>> +
>> + return value;
>> +}
>> +
>> +static int
>> +nilfs_cldconfig_handle_selection_policy_greedy(struct nilfs_cldconfig *config,
>> + char **tokens, size_t ntoks)
>> +{
>> + config->cf_selection_policy.p_importance =
>> + nilfs_cldconfig_selection_policy_greedy;
>> + config->cf_selection_policy.p_comparison =
>> + NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER;
>> + return 0;
>> +}
>> +
>> +static unsigned long long
>> +nilfs_cldconfig_selection_policy_cost_benefit(struct nilfs *nilfs,
>> + const struct nilfs_sustat *sustat,
>> + const struct nilfs_suinfo *si,
>> + __u64 prottime)
>> +{
>> + __u32 free_blocks, cleaning_cost;
>> + unsigned long long value, age;
>> +
>> + free_blocks = nilfs_get_blocks_per_segment(nilfs) - si->sui_nblocks;
>> + /* read the whole segment + write the live blocks */
>> + cleaning_cost = 2 * si->sui_nblocks;
>> + /*
>> + * multiply by 1000 to convert age to milliseconds
>> + * (higher precision for division)
>> + */
>> + age = (sustat->ss_nongc_ctime - si->sui_lastmod) * 1000;
>> +
>> + if (sustat->ss_nongc_ctime < si->sui_lastmod)
>> + return 0;
>> +
>> + if (cleaning_cost == 0)
>> + cleaning_cost = 1;
>> +
>> +
>> + value = (age * free_blocks) / cleaning_cost;
>> +
>> + /*
>> + * the value of sui_nblocks is probably not accurate
>> + * because blocks inside the protection period are not
>> + * considered to be dead
>> + */
>> + if (si->sui_lastdec >= prottime)
>> + value >>= 4;
>> +
>> + return value;
>> +}
>> +
>> +static int
>> +nilfs_cldconfig_handle_selection_policy_cost_benefit(
>> + struct nilfs_cldconfig *config,
>> + char **tokens, size_t ntoks)
>> +{
>> + config->cf_selection_policy.p_importance =
>> + nilfs_cldconfig_selection_policy_cost_benefit;
>> + config->cf_selection_policy.p_comparison =
>> + NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER;
>> return 0;
>> }
>>
>> static const struct nilfs_cldconfig_polhandle
>> nilfs_cldconfig_polhandle_table[] = {
>> {"timestamp", nilfs_cldconfig_handle_selection_policy_timestamp},
>> + {"greedy", nilfs_cldconfig_handle_selection_policy_greedy},
>> + {"cost-benefit", nilfs_cldconfig_handle_selection_policy_cost_benefit},
>> };
>>
>> #define NILFS_CLDCONFIG_NPOLHANDLES \
>> @@ -688,8 +778,8 @@ static void nilfs_cldconfig_set_default(struct nilfs_cldconfig *config,
>>
>> config->cf_selection_policy.p_importance =
>> NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE;
>> - config->cf_selection_policy.p_threshold =
>> - NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD;
>> + config->cf_selection_policy.p_comparison =
>> + NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER;
>> config->cf_protection_period.tv_sec = NILFS_CLDCONFIG_PROTECTION_PERIOD;
>> config->cf_protection_period.tv_usec = 0;
>>
>> diff --git a/sbin/cleanerd/cldconfig.h b/sbin/cleanerd/cldconfig.h
>> index 0a598d5..95d2fde 100644
>> --- a/sbin/cleanerd/cldconfig.h
>> +++ b/sbin/cleanerd/cldconfig.h
>> @@ -30,16 +30,21 @@
>> #include <sys/time.h>
>> #include <syslog.h>
>>
>> +struct nilfs;
>> +struct nilfs_sustat;
>> struct nilfs_suinfo;
>>
>> /**
>> * struct nilfs_selection_policy -
>> - * @p_importance:
>> - * @p_threshold:
>> + * @p_importance: function to calculate the importance for the policy
>> + * @p_comparison: flag that indicates how to sort the importance
>> */
>> struct nilfs_selection_policy {
>> - unsigned long long (*p_importance)(const struct nilfs_suinfo *);
>> - unsigned long long p_threshold;
>> + unsigned long long (*p_importance)(struct nilfs *nilfs,
>> + const struct nilfs_sustat *sustat,
>> + const struct nilfs_suinfo *,
>> + __u64 prottime);
>> + int p_comparison;
>> };
>>
>> /**
>> @@ -113,7 +118,8 @@ struct nilfs_cldconfig {
>>
>> #define NILFS_CLDCONFIG_SELECTION_POLICY_IMPORTANCE \
>> nilfs_cldconfig_selection_policy_timestamp
>> -#define NILFS_CLDCONFIG_SELECTION_POLICY_THRESHOLD 0
>> +#define NILFS_CLDCONFIG_SELECTION_POLICY_BIGGER_IS_BETTER 0
>> +#define NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER 1
>> #define NILFS_CLDCONFIG_PROTECTION_PERIOD 3600
>> #define NILFS_CLDCONFIG_MIN_CLEAN_SEGMENTS 10
>> #define NILFS_CLDCONFIG_MIN_CLEAN_SEGMENTS_UNIT NILFS_SIZE_UNIT_PERCENT
>> @@ -135,8 +141,6 @@ struct nilfs_cldconfig {
>>
>> #define NILFS_CLDCONFIG_NSEGMENTS_PER_CLEAN_MAX 32
>>
>> -struct nilfs;
>> -
>> int nilfs_cldconfig_read(struct nilfs_cldconfig *config, const char *path,
>> struct nilfs *nilfs);
>>
>> diff --git a/sbin/cleanerd/cleanerd.c b/sbin/cleanerd/cleanerd.c
>> index 17de87b..8df3a07 100644
>> --- a/sbin/cleanerd/cleanerd.c
>> +++ b/sbin/cleanerd/cleanerd.c
>> @@ -417,7 +417,7 @@ static void nilfs_cleanerd_destroy(struct nilfs_cleanerd *cleanerd)
>> free(cleanerd);
>> }
>>
>> -static int nilfs_comp_segimp(const void *elem1, const void *elem2)
>> +static int nilfs_comp_segimp_asc(const void *elem1, const void *elem2)
>> {
>> const struct nilfs_segimp *segimp1 = elem1, *segimp2 = elem2;
>>
>> @@ -429,6 +429,18 @@ static int nilfs_comp_segimp(const void *elem1, const void *elem2)
>> return (segimp1->si_segnum < segimp2->si_segnum) ? -1 : 1;
>> }
>>
>> +static int nilfs_comp_segimp_desc(const void *elem1, const void *elem2)
>> +{
>> + const struct nilfs_segimp *segimp1 = elem1, *segimp2 = elem2;
>> +
>> + if (segimp1->si_importance > segimp2->si_importance)
>> + return -1;
>> + else if (segimp1->si_importance < segimp2->si_importance)
>> + return 1;
>> +
>> + return (segimp1->si_segnum < segimp2->si_segnum) ? -1 : 1;
>> +}
>> +
>> static int nilfs_cleanerd_automatic_suspend(struct nilfs_cleanerd *cleanerd)
>> {
>> return cleanerd->config.cf_min_clean_segments > 0;
>> @@ -579,7 +591,7 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
>> __u64 segnum;
>> size_t count, nsegs;
>> ssize_t nssegs, n;
>> - unsigned long long imp, thr;
>> + unsigned long long imp;
>> int i;
>>
>> nsegs = nilfs_cleanerd_ncleansegs(cleanerd);
>> @@ -600,11 +612,8 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
>> prottime = tv2.tv_sec;
>> oldest = tv.tv_sec;
>>
>> - /* The segments that have larger importance than thr are not
>> - * selected. */
>> - thr = (config->cf_selection_policy.p_threshold != 0) ?
>> - config->cf_selection_policy.p_threshold :
>> - sustat->ss_nongc_ctime;
>> + /* sui_lastdec may not be set by nilfs_get_suinfo */
>> + memset(si, 0, sizeof(si));
>>
>> for (segnum = 0; segnum < sustat->ss_nsegs; segnum += n) {
>> count = min_t(__u64, sustat->ss_nsegs - segnum,
>> @@ -615,22 +624,23 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
>> goto out;
>> }
>> for (i = 0; i < n; i++) {
>> - if (!nilfs_suinfo_reclaimable(&si[i]))
>> + if (!nilfs_suinfo_reclaimable(&si[i]) ||
>> + si[i].sui_lastmod >= sustat->ss_nongc_ctime)
>> continue;
>>
>> - imp = config->cf_selection_policy.p_importance(&si[i]);
>> - if (imp < thr) {
>> - if (si[i].sui_lastmod < oldest)
>> - oldest = si[i].sui_lastmod;
>> - if (si[i].sui_lastmod < prottime) {
>> - sm = nilfs_vector_get_new_element(smv);
>> - if (sm == NULL) {
>> - nssegs = -1;
>> - goto out;
>> - }
>> - sm->si_segnum = segnum + i;
>> - sm->si_importance = imp;
>> + imp = config->cf_selection_policy.p_importance(nilfs,
>> + sustat, &si[i], prottime);
>> +
>> + if (si[i].sui_lastmod < oldest)
>> + oldest = si[i].sui_lastmod;
>> + if (si[i].sui_lastmod < prottime) {
>> + sm = nilfs_vector_get_new_element(smv);
>> + if (sm == NULL) {
>> + nssegs = -1;
>> + goto out;
>> }
>> + sm->si_segnum = segnum + i;
>> + sm->si_importance = imp;
>> }
>> }
>> if (n == 0) {
>> @@ -642,7 +652,11 @@ nilfs_cleanerd_select_segments(struct nilfs_cleanerd *cleanerd,
>> break;
>> }
>> }
>> - nilfs_vector_sort(smv, nilfs_comp_segimp);
>> + if (config->cf_selection_policy.p_comparison ==
>> + NILFS_CLDCONFIG_SELECTION_POLICY_SMALLER_IS_BETTER)
>> + nilfs_vector_sort(smv, nilfs_comp_segimp_asc);
>> + else
>> + nilfs_vector_sort(smv, nilfs_comp_segimp_desc);
>>
>> nssegs = (nilfs_vector_get_size(smv) < nsegs) ?
>> nilfs_vector_get_size(smv) : nsegs;
>> --
>> 1.9.0
>
> scripts/checkpatch.pl detected the following coding style issues:
>
> ERROR: code indent should use tabs where possible
> #171: FILE: sbin/cleanerd/cldconfig.c:404:
> +^I^I^I^I const struct nilfs_sustat *sustat,$
>
> ERROR: code indent should use tabs where possible
> #172: FILE: sbin/cleanerd/cldconfig.c:405:
> +^I^I^I^I const struct nilfs_suinfo *si,$
>
> ERROR: code indent should use tabs where possible
> #173: FILE: sbin/cleanerd/cldconfig.c:406:
> +^I^I^I^I __u64 prottime)$
>
>
> Please mind it next time. (You don't have to resubmit the whole series
> now for this).
Normally I execute checkpatch.pl automatically via the pre-commit hook.
For some reason the pre-commit hook got overwritten in my local repo. I
am sorry for that.
> I would like to first understand this series, but I am very busy
> recently. (Also, I am still pending review of Vycheslav's xattr
> patchset.) So, let me go forward a little bit at a time.
Ok no problem.
Regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <0ED0D5DA-9AE9-44B8-8936-1680DE2B64C5-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-03-16 16:02 ` Andreas Rohner
0 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 16:02 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 2014-03-16 14:34, Vyacheslav Dubeyko wrote:
>
>> On 16 марта 2014 г., at 16:24, Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> wrote:
>>
>>> On 2014-03-16 14:00, Vyacheslav Dubeyko wrote:
>>>
>>>> On Mar 16, 2014, at 1:47 PM, Andreas Rohner wrote:
>>>>
>>>> This patch adds an additional timestamp to the segment usage
>>>> information that indicates the last time the usage information was
>>>> changed. So su_lastmod indicates the last time the segment itself was
>>>> modified and su_lastdec indicates the last time the usage information
>>>> itself was changed.
>>>
>>> What will we have if user changes time?
>>> What sequence will we have after such "malicious" action?
>>> Did you test such situation?
>>
>> The timestamp is just a hint for the userspace GC. If the hint is wrong
>> the result would be that the GC is less efficient for a while. After a
>> while it would go back to normal. You have the same problem with the
>> already existing su_lastmod timestamp.
>>
>
> But I worry about such thing. Previously, we had complains of users about
> different issues with timestamp policy of GC. And I had hope that namely
> new GC policies can resolve such GC disadvantage. So, what have we again?
> The same issue of GC?
Yes but I have to compare it to the protection period, which is a
timestamp. Maybe I could use the current checkpoint number instead...
Regards,
Andreas Rohner
> Thanks,
> Vyacheslav Dubeyko.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks
[not found] ` <20140316.223111.52181167.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-03-16 16:19 ` Andreas Rohner
0 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-16 16:19 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-16 14:31, Ryusuke Konishi wrote:
> On Sun, 16 Mar 2014 17:06:10 +0300, Vyacheslav Dubeyko wrote:
>>
>> On Mar 16, 2014, at 3:24 PM, Andreas Rohner wrote:
>>
>>>>>
>>>>> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
>>>>> index ff3fea3..ca269ad 100644
>>>>> --- a/include/linux/nilfs2_fs.h
>>>>> +++ b/include/linux/nilfs2_fs.h
>>>>> @@ -614,11 +614,13 @@ struct nilfs_cpfile_header {
>>>>> * @su_lastmod: last modified timestamp
>>>>> * @su_nblocks: number of blocks in segment
>>>>> * @su_flags: flags
>>>>> + * @su_lastdec: last decrement of su_nblocks timestamp
>>>>> */
>>>>> struct nilfs_segment_usage {
>>>>> __le64 su_lastmod;
>>>>> __le32 su_nblocks;
>>>>> __le32 su_flags;
>>>>> + __le64 su_lastdec;
>>>>
>>>> So, this change makes on-disk layout incompatible with previous one.
>>>> Am I correct? At first it needs to be fully confident that we really need in
>>>> changing in this place. Secondly, it needs to add incompatible flag for
>>>> s_feature_incompat field of superblock and maybe mount option.
>>>
>>> No it IS compatible. NILFS uses the entry sizes stored in the super
>>> block. Notice, that the code does not depend on sizeof(struct
>>> nilfs_suinfo) or sizeof(struct nilfs_segment_usage). So an old kernel
>>> can read a file system with su_lastdec and a new kernel can read an old
>>> file system without su_lastdec.
>>
>> But, anyway, I think that you add some new feature by this and previous
>> patches. I suppose that it makes sense to add specially dedicated flag or
>> flags in s_feature_xxx field of superblock. If feature is compatible with
>> previous state of driver then flag can be added for s_feature_compat
>> field.
>>
>> Thanks,
>> Vyacheslav Dubeyko.
>
> This is important thing. Please evaluate backward compatibility and
> forward compatibility of modifications, and properly add one of
> incompat, compat_ro, or compat flags as Vyacheslav mentioned. It will
> be a focal point of early stage review.
Ok, I have to look into these flags.
I reuse su_nblocks to represent the number of live blocks, which gets
incremented and decremented as files are deleted and snapshots
created/removed. That is definitely incompatible. Is it better to set a
incompat flag or should I define a new field like su_nliveblocks? With a
new field it could be compatible with older drivers, but it would add
another 8 bytes to the structure.
But if I understood you correctly I need to add a new feature flag in
any case.
Regards,
Andreas Rohner
> Regards,
> Ryusuke Konishi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/6] nilfs2: add helper function to go through all entries of meta data file
[not found] ` <2adbf1034ab4b129223553746577f6ec0e699869.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-17 6:51 ` Vyacheslav Dubeyko
2014-03-17 9:24 ` Andreas Rohner
0 siblings, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-17 6:51 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
> This patch introduces the nilfs_palloc_scan_entries() function,
> which takes an inode of one of nilfs' meta data files and iterates
> through all of its entries. For each entry the callback function
> pointer that is given as a parameter is called. The data parameter
> is passed to the callback function, so that it may receive
> parameters and return results.
>
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> ---
> fs/nilfs2/alloc.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/nilfs2/alloc.h | 6 +++
> 2 files changed, 127 insertions(+)
>
> diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
> index 741fd02..0edd85a 100644
> --- a/fs/nilfs2/alloc.c
> +++ b/fs/nilfs2/alloc.c
> @@ -545,6 +545,127 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
> }
>
> /**
> + * nilfs_palloc_scan_entries - scan through every entry and execute dofunc
> + * @inode: inode of metadata file using this allocator
> + * @dofunc: function executed for every entry
> + * @data: data pointer passed to dofunc
> + *
> + * Description: nilfs_palloc_scan_entries() walks through every allocated entry
> + * of a metadata file and executes dofunc on it. It passes a data pointer to
> + * dofunc, which can be used as an input parameter or for returning of results.
> + *
> + * Return Value: On success, 0 is returned. On error, a
> + * negative error code is returned.
> + */
> +int nilfs_palloc_scan_entries(struct inode *inode,
> + void (*dofunc)(struct inode *,
> + struct nilfs_palloc_req *,
> + void *),
> + void *data)
> +{
> + struct buffer_head *desc_bh, *bitmap_bh;
> + struct nilfs_palloc_group_desc *desc;
> + struct nilfs_palloc_req req;
> + unsigned char *bitmap;
> + void *desc_kaddr, *bitmap_kaddr;
> + unsigned long group, maxgroup, ngroups;
> + unsigned long n, m, entries_per_group, groups_per_desc_block;
> + unsigned long i, j, pos;
> + unsigned long blkoff, prev_blkoff;
> + int ret;
> +
I think that it really makes sense to split this function's code between
several small functions. It improves code style and readability of
function. Moreover, it makes function more easy understandable.
> + ngroups = nilfs_palloc_groups_count(inode);
> + maxgroup = ngroups - 1;
> + entries_per_group = nilfs_palloc_entries_per_group(inode);
> + groups_per_desc_block = nilfs_palloc_groups_per_desc_block(inode);
> +
> + for (group = 0; group < ngroups;) {
> + ret = nilfs_palloc_get_desc_block(inode, group, 0, &desc_bh);
> + if (ret == -ENOENT)
I suggest to add comment here.
> + return 0;
> + else if (ret < 0)
> + return ret;
> + req.pr_desc_bh = desc_bh;
> + desc_kaddr = kmap(desc_bh->b_page);
> + desc = nilfs_palloc_block_get_group_desc(inode, group,
> + desc_bh, desc_kaddr);
> + n = nilfs_palloc_rest_groups_in_desc_block(inode, group,
> + maxgroup);
> +
> + for (i = 0; i < n; i++, desc++, group++) {
> + m = entries_per_group -
> + nilfs_palloc_group_desc_nfrees(inode,
> + group, desc);
Looks weird. It makes sense to split on several functions or to use
variable.
> + if (!m)
> + continue;
> +
> + ret = nilfs_palloc_get_bitmap_block(
> + inode, group, 0, &bitmap_bh);
Ditto. Looks weird.
> + if (ret == -ENOENT) {
> + ret = 0;
> + goto out_desc;
It needs to add comment here. Otherwise, it looks weird because anyway
we go to out_desc. Maybe to combine:
if (unlikely(ret < 0)) {
if (ret == -ENOENT)
ret = 0;
goto out_desc;
}
Anyway, it needs to comment why we assign zero for the case of -ENOENT.
> + } else if (ret < 0)
> + goto out_desc;
> +
> + req.pr_bitmap_bh = bitmap_bh;
> + bitmap_kaddr = kmap(bitmap_bh->b_page);
> + bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
> + /* entry blkoff is always bigger than 0 */
> + blkoff = 0;
> + pos = 0;
> +
> + for (j = 0; j < m; ++j, ++pos) {
> + pos = nilfs_find_next_bit(bitmap,
> + entries_per_group, pos);
> +
> + if (pos >= entries_per_group)
> + break;
> +
> + /* found an entry */
> + req.pr_entry_nr =
> + entries_per_group * group + pos;
> +
> + prev_blkoff = blkoff;
> + blkoff = nilfs_palloc_entry_blkoff(inode,
> + req.pr_entry_nr);
> +
> + if (blkoff != prev_blkoff) {
> + if (prev_blkoff)
> + brelse(req.pr_entry_bh);
> +
> + ret = nilfs_palloc_get_entry_block(
> + inode, req.pr_entry_nr,
> + 0, &req.pr_entry_bh);
Ahhhh. Look weird. :) Split on small functions with clear names, anyway.
It really improves the code from any point of view.
Thanks,
Vyacheslav Dubeyko.
> + if (ret < 0)
> + goto out_entry;
> + }
> +
> + dofunc(inode, &req, data);
> + }
> +
> + if (blkoff)
> + brelse(req.pr_entry_bh);
> + kunmap(bitmap_bh->b_page);
> + brelse(bitmap_bh);
> + }
> +
> + kunmap(desc_bh->b_page);
> + brelse(desc_bh);
> + }
> +
> + return 0;
> +
> +out_entry:
> + kunmap(bitmap_bh->b_page);
> + brelse(bitmap_bh);
> +
> +out_desc:
> + kunmap(desc_bh->b_page);
> + brelse(desc_bh);
> + return ret;
> +}
> +
> +/**
> * nilfs_palloc_commit_alloc_entry - finish allocation of a persistent object
> * @inode: inode of metadata file using this allocator
> * @req: nilfs_palloc_req structure exchanged for the allocation
> diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
> index 4bd6451..0592035 100644
> --- a/fs/nilfs2/alloc.h
> +++ b/fs/nilfs2/alloc.h
> @@ -77,6 +77,7 @@ int nilfs_palloc_freev(struct inode *, __u64 *, size_t);
> #define nilfs_set_bit_atomic ext2_set_bit_atomic
> #define nilfs_clear_bit_atomic ext2_clear_bit_atomic
> #define nilfs_find_next_zero_bit find_next_zero_bit_le
> +#define nilfs_find_next_bit find_next_bit_le
>
> /**
> * struct nilfs_bh_assoc - block offset and buffer head association
> @@ -106,5 +107,10 @@ void nilfs_palloc_setup_cache(struct inode *inode,
> struct nilfs_palloc_cache *cache);
> void nilfs_palloc_clear_cache(struct inode *inode);
> void nilfs_palloc_destroy_cache(struct inode *inode);
> +int nilfs_palloc_scan_entries(struct inode *,
> + void (*dofunc)(struct inode *,
> + struct nilfs_palloc_req *,
> + void *),
> + void *);
>
> #endif /* _NILFS_ALLOC_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 3/6] nilfs2: scan dat entries at snapshot creation/deletion time
[not found] ` <29dee92595249b713fff1e4903d5d76556926eec.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-17 7:04 ` Vyacheslav Dubeyko
2014-03-17 9:35 ` Andreas Rohner
0 siblings, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-17 7:04 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
> To accurately count the number of live blocks in a segment, it is
> important to take snapshots into account, because snapshots can protect
> reclaimable blocks from being cleaned.
>
> This patch uses the previously reserved de_rsv field of the
> nilfs_dat_entry struct to store one of the snapshots the corresponding
> block belongs to. One block can belong to many snapshots, but because
> the snapshots are stored in a sorted linked list, it is easy to check if
> a block belongs to any other snapshot given the previous and the next
> snapshot. For example if the current snapshot (in de_ss) is being
> removed and neither the previous nor the next snapshot is in the range
> of de_start to de_end, then it is guaranteed that the block doesn't
> belong to any other snapshot and is reclaimable. On the other hand if
> lets say the previous snapshot is in the range of de_start to de_end, we
> simply set de_ss to the previous snapshot and the block is not
> reclaimable.
>
> To implement this every DAT entry is scanned at snapshot
> creation/deletion time and updated if needed.
It is well known problem of NILFS2 that deletion is very slow operation
for big files because of necessity to update DAT file (de_end: end
checkpoint number). So, how your addition does affect this disadvantage?
> To avoid too many update
> operations only potentially reclaimable blocks are ever updated. For
> example if there are some deleted files and the checkpoint to which
> these files belong is turned into a snapshot, then su_nblocks is
> incremented for these blocks, which reverses the decrement that happened
> when the files were deleted. If after some time this snapshot is
> deleted, su_nblocks is decremented again to reverse the increment at
> creation time.
>
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> ---
> fs/nilfs2/cpfile.c | 7 ++++
> fs/nilfs2/dat.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
> fs/nilfs2/dat.h | 26 ++++++++++++++
> include/linux/nilfs2_fs.h | 4 +--
> 4 files changed, 121 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
> index 0d58075..29952f5 100644
> --- a/fs/nilfs2/cpfile.c
> +++ b/fs/nilfs2/cpfile.c
> @@ -28,6 +28,7 @@
> #include <linux/nilfs2_fs.h>
> #include "mdt.h"
> #include "cpfile.h"
> +#include "sufile.h"
>
>
> static inline unsigned long
> @@ -584,6 +585,7 @@ static int nilfs_cpfile_set_snapshot(struct inode *cpfile, __u64 cno)
> struct nilfs_cpfile_header *header;
> struct nilfs_checkpoint *cp;
> struct nilfs_snapshot_list *list;
> + struct the_nilfs *nilfs = cpfile->i_sb->s_fs_info;
> __u64 curr, prev;
> unsigned long curr_blkoff, prev_blkoff;
> void *kaddr;
> @@ -681,6 +683,8 @@ static int nilfs_cpfile_set_snapshot(struct inode *cpfile, __u64 cno)
> mark_buffer_dirty(header_bh);
> nilfs_mdt_mark_dirty(cpfile);
>
> + nilfs_dat_scan_inc_ss(nilfs->ns_dat, cno);
> +
> brelse(prev_bh);
>
> out_curr:
> @@ -703,6 +707,7 @@ static int nilfs_cpfile_clear_snapshot(struct inode *cpfile, __u64 cno)
> struct nilfs_cpfile_header *header;
> struct nilfs_checkpoint *cp;
> struct nilfs_snapshot_list *list;
> + struct the_nilfs *nilfs = cpfile->i_sb->s_fs_info;
> __u64 next, prev;
> void *kaddr;
> int ret;
> @@ -784,6 +789,8 @@ static int nilfs_cpfile_clear_snapshot(struct inode *cpfile, __u64 cno)
> mark_buffer_dirty(header_bh);
> nilfs_mdt_mark_dirty(cpfile);
>
> + nilfs_dat_scan_dec_ss(nilfs->ns_dat, cno, prev, next);
> +
> brelse(prev_bh);
>
> out_next:
> diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
> index 0d5fada..89a4a5f 100644
> --- a/fs/nilfs2/dat.c
> +++ b/fs/nilfs2/dat.c
> @@ -28,6 +28,7 @@
> #include "mdt.h"
> #include "alloc.h"
> #include "dat.h"
> +#include "sufile.h"
>
>
> #define NILFS_CNO_MIN ((__u64)1)
> @@ -97,6 +98,7 @@ void nilfs_dat_commit_alloc(struct inode *dat, struct nilfs_palloc_req *req)
> entry->de_start = cpu_to_le64(NILFS_CNO_MIN);
> entry->de_end = cpu_to_le64(NILFS_CNO_MAX);
> entry->de_blocknr = cpu_to_le64(0);
> + entry->de_ss = cpu_to_le64(0);
> kunmap_atomic(kaddr);
>
> nilfs_palloc_commit_alloc_entry(dat, req);
> @@ -121,6 +123,7 @@ static void nilfs_dat_commit_free(struct inode *dat,
> entry->de_start = cpu_to_le64(NILFS_CNO_MIN);
> entry->de_end = cpu_to_le64(NILFS_CNO_MIN);
> entry->de_blocknr = cpu_to_le64(0);
> + entry->de_ss = cpu_to_le64(0);
> kunmap_atomic(kaddr);
>
> nilfs_dat_commit_entry(dat, req);
> @@ -201,6 +204,7 @@ void nilfs_dat_commit_end(struct inode *dat, struct nilfs_palloc_req *req,
> WARN_ON(start > end);
> }
> entry->de_end = cpu_to_le64(end);
> + entry->de_ss = cpu_to_le64(NILFS_CNO_MAX);
> blocknr = le64_to_cpu(entry->de_blocknr);
> kunmap_atomic(kaddr);
>
> @@ -365,6 +369,8 @@ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr)
> }
> WARN_ON(blocknr == 0);
> entry->de_blocknr = cpu_to_le64(blocknr);
> + if (entry->de_ss == cpu_to_le64(NILFS_CNO_MAX))
> + entry->de_ss = cpu_to_le64(0);
> kunmap_atomic(kaddr);
>
> mark_buffer_dirty(entry_bh);
> @@ -430,6 +436,86 @@ int nilfs_dat_translate(struct inode *dat, __u64 vblocknr, sector_t *blocknrp)
> return ret;
> }
>
> +void nilfs_dat_do_scan_dec(struct inode *dat, struct nilfs_palloc_req *req,
> + void *data)
> +{
> + struct nilfs_dat_entry *entry;
> + __u64 start, end, prev_ss;
> + __u64 *ssp = data, ss = ssp[0], prev = ssp[1], next = ssp[2];
> + sector_t blocknr;
> + void *kaddr;
> + struct the_nilfs *nilfs;
> +
> + kaddr = kmap_atomic(req->pr_entry_bh->b_page);
> + entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
> + req->pr_entry_bh, kaddr);
> + start = le64_to_cpu(entry->de_start);
> + end = le64_to_cpu(entry->de_end);
> + blocknr = le64_to_cpu(entry->de_blocknr);
> + prev_ss = le64_to_cpu(entry->de_ss);
> +
> + if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end) {
I think that it makes sense to use small functions with clear names
about what we check.
> + if (prev_ss == ss || prev_ss == NILFS_CNO_MAX) {
> + if (prev && prev >= start && prev < end)
> + entry->de_ss = cpu_to_le64(prev);
> + else if (next && next >= start && next < end)
> + entry->de_ss = cpu_to_le64(next);
> + else
> + entry->de_ss = cpu_to_le64(0);
Ditto.
> +
> + if (prev_ss != NILFS_CNO_MAX)
> + prev_ss = le64_to_cpu(entry->de_ss);
> + kunmap_atomic(kaddr);
> + mark_buffer_dirty(req->pr_entry_bh);
> + nilfs_mdt_mark_dirty(dat);
> + } else
> + kunmap_atomic(kaddr);
> +
> + if (prev_ss == 0) {
> + nilfs = dat->i_sb->s_fs_info;
> + nilfs_sufile_add_segment_usage(nilfs->ns_sufile,
> + nilfs_get_segnum_of_block(nilfs, blocknr),
> + -1, 0);
> + }
> + } else
> + kunmap_atomic(kaddr);
> +}
> +
> +void nilfs_dat_do_scan_inc(struct inode *dat, struct nilfs_palloc_req *req,
> + void *data)
> +{
> + struct nilfs_dat_entry *entry;
> + __u64 start, end, prev_ss;
> + __u64 *ssp = data, ss = *ssp;
> + sector_t blocknr;
> + void *kaddr;
> + struct the_nilfs *nilfs;
> +
> + kaddr = kmap_atomic(req->pr_entry_bh->b_page);
> + entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
> + req->pr_entry_bh, kaddr);
> + start = le64_to_cpu(entry->de_start);
> + end = le64_to_cpu(entry->de_end);
> + blocknr = le64_to_cpu(entry->de_blocknr);
> + prev_ss = le64_to_cpu(entry->de_ss);
> +
> + if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end &&
> + (prev_ss == 0 || prev_ss == NILFS_CNO_MAX)) {
Ditto. Moreover, you repeat this check.
> +
> + entry->de_ss = cpu_to_le64(ss);
> +
> + kunmap_atomic(kaddr);
> + mark_buffer_dirty(req->pr_entry_bh);
> + nilfs_mdt_mark_dirty(dat);
> +
> + nilfs = dat->i_sb->s_fs_info;
> + nilfs_sufile_add_segment_usage(nilfs->ns_sufile,
> + nilfs_get_segnum_of_block(nilfs, blocknr),
Looks weird. Maybe, variable?
Thanks,
Vyacheslav Dubeyko.
> + 1, 0);
> + } else
> + kunmap_atomic(kaddr);
> +}
> +
> ssize_t nilfs_dat_get_vinfo(struct inode *dat, void *buf, unsigned visz,
> size_t nvi)
> {
> diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
> index cbd8e97..92a187e 100644
> --- a/fs/nilfs2/dat.h
> +++ b/fs/nilfs2/dat.h
> @@ -55,5 +55,31 @@ ssize_t nilfs_dat_get_vinfo(struct inode *, void *, unsigned, size_t);
>
> int nilfs_dat_read(struct super_block *sb, size_t entry_size,
> struct nilfs_inode *raw_inode, struct inode **inodep);
> +void nilfs_dat_do_scan_dec(struct inode *, struct nilfs_palloc_req *, void *);
> +void nilfs_dat_do_scan_inc(struct inode *, struct nilfs_palloc_req *, void *);
> +
> +/**
> + * nilfs_dat_scan_dec_ss - scan all dat entries for a checkpoint dec suinfo
> + * @dat: inode of dat file
> + * @cno: snapshot number
> + * @prev: previous snapshot number
> + * @next: next snapshot number
> + */
> +static inline int nilfs_dat_scan_dec_ss(struct inode *dat, __u64 cno,
> + __u64 prev, __u64 next)
> +{
> + __u64 data[3] = { cno, prev, next };
> + return nilfs_palloc_scan_entries(dat, nilfs_dat_do_scan_dec, data);
> +}
> +
> +/**
> + * nilfs_dat_scan_dec_ss - scan all dat entries for a checkpoint inc suinfo
> + * @dat: inode of dat file
> + * @cno: snapshot number
> + */
> +static inline int nilfs_dat_scan_inc_ss(struct inode *dat, __u64 cno)
> +{
> + return nilfs_palloc_scan_entries(dat, nilfs_dat_do_scan_inc, &cno);
> +}
>
> #endif /* _NILFS_DAT_H */
> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
> index ca269ad..ba9ebe02 100644
> --- a/include/linux/nilfs2_fs.h
> +++ b/include/linux/nilfs2_fs.h
> @@ -475,13 +475,13 @@ struct nilfs_palloc_group_desc {
> * @de_blocknr: block number
> * @de_start: start checkpoint number
> * @de_end: end checkpoint number
> - * @de_rsv: reserved for future use
> + * @de_ss: one of the snapshots the block belongs to
> */
> struct nilfs_dat_entry {
> __le64 de_blocknr;
> __le64 de_start;
> __le64 de_end;
> - __le64 de_rsv;
> + __le64 de_ss;
> };
>
> #define NILFS_MIN_DAT_ENTRY_SIZE 32
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/6] nilfs2: add helper function to go through all entries of meta data file
2014-03-17 6:51 ` Vyacheslav Dubeyko
@ 2014-03-17 9:24 ` Andreas Rohner
0 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-17 9:24 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-17 07:51, Vyacheslav Dubeyko wrote:
> On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
>> This patch introduces the nilfs_palloc_scan_entries() function,
>> which takes an inode of one of nilfs' meta data files and iterates
>> through all of its entries. For each entry the callback function
>> pointer that is given as a parameter is called. The data parameter
>> is passed to the callback function, so that it may receive
>> parameters and return results.
>>
>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>> ---
>> fs/nilfs2/alloc.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> fs/nilfs2/alloc.h | 6 +++
>> 2 files changed, 127 insertions(+)
>>
>> diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
>> index 741fd02..0edd85a 100644
>> --- a/fs/nilfs2/alloc.c
>> +++ b/fs/nilfs2/alloc.c
>> @@ -545,6 +545,127 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
>> }
>>
>> /**
>> + * nilfs_palloc_scan_entries - scan through every entry and execute dofunc
>> + * @inode: inode of metadata file using this allocator
>> + * @dofunc: function executed for every entry
>> + * @data: data pointer passed to dofunc
>> + *
>> + * Description: nilfs_palloc_scan_entries() walks through every allocated entry
>> + * of a metadata file and executes dofunc on it. It passes a data pointer to
>> + * dofunc, which can be used as an input parameter or for returning of results.
>> + *
>> + * Return Value: On success, 0 is returned. On error, a
>> + * negative error code is returned.
>> + */
>> +int nilfs_palloc_scan_entries(struct inode *inode,
>> + void (*dofunc)(struct inode *,
>> + struct nilfs_palloc_req *,
>> + void *),
>> + void *data)
>> +{
>> + struct buffer_head *desc_bh, *bitmap_bh;
>> + struct nilfs_palloc_group_desc *desc;
>> + struct nilfs_palloc_req req;
>> + unsigned char *bitmap;
>> + void *desc_kaddr, *bitmap_kaddr;
>> + unsigned long group, maxgroup, ngroups;
>> + unsigned long n, m, entries_per_group, groups_per_desc_block;
>> + unsigned long i, j, pos;
>> + unsigned long blkoff, prev_blkoff;
>> + int ret;
>> +
>
> I think that it really makes sense to split this function's code between
> several small functions. It improves code style and readability of
> function. Moreover, it makes function more easy understandable.
Ok I could move one of the inner for-loops into a separate function.
>> + ngroups = nilfs_palloc_groups_count(inode);
>> + maxgroup = ngroups - 1;
>> + entries_per_group = nilfs_palloc_entries_per_group(inode);
>> + groups_per_desc_block = nilfs_palloc_groups_per_desc_block(inode);
>> +
>> + for (group = 0; group < ngroups;) {
>> + ret = nilfs_palloc_get_desc_block(inode, group, 0, &desc_bh);
>> + if (ret == -ENOENT)
>
> I suggest to add comment here.
Ok.
-ENOENT basically means, that the description block is not allocated
yet, which is not an error. ngroups is a very big constant value and
does not contain the actual number of groups, but rather the maximum
number of groups. So the only way to tell if it is the last group is by
-ENOENT error.
>> + return 0;
>> + else if (ret < 0)
>> + return ret;
>> + req.pr_desc_bh = desc_bh;
>> + desc_kaddr = kmap(desc_bh->b_page);
>> + desc = nilfs_palloc_block_get_group_desc(inode, group,
>> + desc_bh, desc_kaddr);
>> + n = nilfs_palloc_rest_groups_in_desc_block(inode, group,
>> + maxgroup);
>> +
>> + for (i = 0; i < n; i++, desc++, group++) {
>> + m = entries_per_group -
>> + nilfs_palloc_group_desc_nfrees(inode,
>> + group, desc);
>
> Looks weird. It makes sense to split on several functions or to use
> variable.
>
>> + if (!m)
>> + continue;
>> +
>> + ret = nilfs_palloc_get_bitmap_block(
>> + inode, group, 0, &bitmap_bh);
>
> Ditto. Looks weird.
>
>> + if (ret == -ENOENT) {
>> + ret = 0;
>> + goto out_desc;
>
> It needs to add comment here. Otherwise, it looks weird because anyway
> we go to out_desc. Maybe to combine:
>
> if (unlikely(ret < 0)) {
> if (ret == -ENOENT)
> ret = 0;
> goto out_desc;
> }
>
> Anyway, it needs to comment why we assign zero for the case of -ENOENT.
Hmm in this case -ENOENT should be considered an error. If m > 0 then
nilfs_palloc_get_bitmap_block() should not return -ENOENT.
>> + } else if (ret < 0)
>> + goto out_desc;
>> +
>> + req.pr_bitmap_bh = bitmap_bh;
>> + bitmap_kaddr = kmap(bitmap_bh->b_page);
>> + bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
>> + /* entry blkoff is always bigger than 0 */
>> + blkoff = 0;
>> + pos = 0;
>> +
>> + for (j = 0; j < m; ++j, ++pos) {
>> + pos = nilfs_find_next_bit(bitmap,
>> + entries_per_group, pos);
>> +
>> + if (pos >= entries_per_group)
>> + break;
>> +
>> + /* found an entry */
>> + req.pr_entry_nr =
>> + entries_per_group * group + pos;
>> +
>> + prev_blkoff = blkoff;
>> + blkoff = nilfs_palloc_entry_blkoff(inode,
>> + req.pr_entry_nr);
>> +
>> + if (blkoff != prev_blkoff) {
>> + if (prev_blkoff)
>> + brelse(req.pr_entry_bh);
>> +
>> + ret = nilfs_palloc_get_entry_block(
>> + inode, req.pr_entry_nr,
>> + 0, &req.pr_entry_bh);
>
> Ahhhh. Look weird. :) Split on small functions with clear names, anyway.
> It really improves the code from any point of view.
Ok.
> Thanks,
> Vyacheslav Dubeyko.
>
>> + if (ret < 0)
>> + goto out_entry;
>> + }
>> +
>> + dofunc(inode, &req, data);
>> + }
>> +
>> + if (blkoff)
>> + brelse(req.pr_entry_bh);
>> + kunmap(bitmap_bh->b_page);
>> + brelse(bitmap_bh);
>> + }
>> +
>> + kunmap(desc_bh->b_page);
>> + brelse(desc_bh);
>> + }
>> +
>> + return 0;
>> +
>> +out_entry:
>> + kunmap(bitmap_bh->b_page);
>> + brelse(bitmap_bh);
>> +
>> +out_desc:
>> + kunmap(desc_bh->b_page);
>> + brelse(desc_bh);
>> + return ret;
>> +}
>> +
>> +/**
>> * nilfs_palloc_commit_alloc_entry - finish allocation of a persistent object
>> * @inode: inode of metadata file using this allocator
>> * @req: nilfs_palloc_req structure exchanged for the allocation
>> diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
>> index 4bd6451..0592035 100644
>> --- a/fs/nilfs2/alloc.h
>> +++ b/fs/nilfs2/alloc.h
>> @@ -77,6 +77,7 @@ int nilfs_palloc_freev(struct inode *, __u64 *, size_t);
>> #define nilfs_set_bit_atomic ext2_set_bit_atomic
>> #define nilfs_clear_bit_atomic ext2_clear_bit_atomic
>> #define nilfs_find_next_zero_bit find_next_zero_bit_le
>> +#define nilfs_find_next_bit find_next_bit_le
>>
>> /**
>> * struct nilfs_bh_assoc - block offset and buffer head association
>> @@ -106,5 +107,10 @@ void nilfs_palloc_setup_cache(struct inode *inode,
>> struct nilfs_palloc_cache *cache);
>> void nilfs_palloc_clear_cache(struct inode *inode);
>> void nilfs_palloc_destroy_cache(struct inode *inode);
>> +int nilfs_palloc_scan_entries(struct inode *,
>> + void (*dofunc)(struct inode *,
>> + struct nilfs_palloc_req *,
>> + void *),
>> + void *);
>>
>> #endif /* _NILFS_ALLOC_H */
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 3/6] nilfs2: scan dat entries at snapshot creation/deletion time
2014-03-17 7:04 ` Vyacheslav Dubeyko
@ 2014-03-17 9:35 ` Andreas Rohner
[not found] ` <5326C1E5.10108-hi6Y0CQ0nG0@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-17 9:35 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-17 08:04, Vyacheslav Dubeyko wrote:
> On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
>> To accurately count the number of live blocks in a segment, it is
>> important to take snapshots into account, because snapshots can protect
>> reclaimable blocks from being cleaned.
>>
>> This patch uses the previously reserved de_rsv field of the
>> nilfs_dat_entry struct to store one of the snapshots the corresponding
>> block belongs to. One block can belong to many snapshots, but because
>> the snapshots are stored in a sorted linked list, it is easy to check if
>> a block belongs to any other snapshot given the previous and the next
>> snapshot. For example if the current snapshot (in de_ss) is being
>> removed and neither the previous nor the next snapshot is in the range
>> of de_start to de_end, then it is guaranteed that the block doesn't
>> belong to any other snapshot and is reclaimable. On the other hand if
>> lets say the previous snapshot is in the range of de_start to de_end, we
>> simply set de_ss to the previous snapshot and the block is not
>> reclaimable.
>>
>> To implement this every DAT entry is scanned at snapshot
>> creation/deletion time and updated if needed.
>
> It is well known problem of NILFS2 that deletion is very slow operation
> for big files because of necessity to update DAT file (de_end: end
> checkpoint number). So, how your addition does affect this disadvantage?
Additionally to setting "de_end: end checkpoint number" the live block
counter in the SUFILE needs to be decremented. This makes the deletion a
little bit more expensive, but its not really noticeable, because the
SUFILE-Entries are mostly in the cache. I have timed the deletion of 100
GB and there is no discernible difference in the performance.
But my additions make snapshot creation and deletion more expensive.
>> To avoid too many update
>> operations only potentially reclaimable blocks are ever updated. For
>> example if there are some deleted files and the checkpoint to which
>> these files belong is turned into a snapshot, then su_nblocks is
>> incremented for these blocks, which reverses the decrement that happened
>> when the files were deleted. If after some time this snapshot is
>> deleted, su_nblocks is decremented again to reverse the increment at
>> creation time.
>>
>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>> ---
>> fs/nilfs2/cpfile.c | 7 ++++
>> fs/nilfs2/dat.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++
>> fs/nilfs2/dat.h | 26 ++++++++++++++
>> include/linux/nilfs2_fs.h | 4 +--
>> 4 files changed, 121 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
>> index 0d58075..29952f5 100644
>> --- a/fs/nilfs2/cpfile.c
>> +++ b/fs/nilfs2/cpfile.c
>> @@ -28,6 +28,7 @@
>> #include <linux/nilfs2_fs.h>
>> #include "mdt.h"
>> #include "cpfile.h"
>> +#include "sufile.h"
>>
>>
>> static inline unsigned long
>> @@ -584,6 +585,7 @@ static int nilfs_cpfile_set_snapshot(struct inode *cpfile, __u64 cno)
>> struct nilfs_cpfile_header *header;
>> struct nilfs_checkpoint *cp;
>> struct nilfs_snapshot_list *list;
>> + struct the_nilfs *nilfs = cpfile->i_sb->s_fs_info;
>> __u64 curr, prev;
>> unsigned long curr_blkoff, prev_blkoff;
>> void *kaddr;
>> @@ -681,6 +683,8 @@ static int nilfs_cpfile_set_snapshot(struct inode *cpfile, __u64 cno)
>> mark_buffer_dirty(header_bh);
>> nilfs_mdt_mark_dirty(cpfile);
>>
>> + nilfs_dat_scan_inc_ss(nilfs->ns_dat, cno);
>> +
>> brelse(prev_bh);
>>
>> out_curr:
>> @@ -703,6 +707,7 @@ static int nilfs_cpfile_clear_snapshot(struct inode *cpfile, __u64 cno)
>> struct nilfs_cpfile_header *header;
>> struct nilfs_checkpoint *cp;
>> struct nilfs_snapshot_list *list;
>> + struct the_nilfs *nilfs = cpfile->i_sb->s_fs_info;
>> __u64 next, prev;
>> void *kaddr;
>> int ret;
>> @@ -784,6 +789,8 @@ static int nilfs_cpfile_clear_snapshot(struct inode *cpfile, __u64 cno)
>> mark_buffer_dirty(header_bh);
>> nilfs_mdt_mark_dirty(cpfile);
>>
>> + nilfs_dat_scan_dec_ss(nilfs->ns_dat, cno, prev, next);
>> +
>> brelse(prev_bh);
>>
>> out_next:
>> diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
>> index 0d5fada..89a4a5f 100644
>> --- a/fs/nilfs2/dat.c
>> +++ b/fs/nilfs2/dat.c
>> @@ -28,6 +28,7 @@
>> #include "mdt.h"
>> #include "alloc.h"
>> #include "dat.h"
>> +#include "sufile.h"
>>
>>
>> #define NILFS_CNO_MIN ((__u64)1)
>> @@ -97,6 +98,7 @@ void nilfs_dat_commit_alloc(struct inode *dat, struct nilfs_palloc_req *req)
>> entry->de_start = cpu_to_le64(NILFS_CNO_MIN);
>> entry->de_end = cpu_to_le64(NILFS_CNO_MAX);
>> entry->de_blocknr = cpu_to_le64(0);
>> + entry->de_ss = cpu_to_le64(0);
>> kunmap_atomic(kaddr);
>>
>> nilfs_palloc_commit_alloc_entry(dat, req);
>> @@ -121,6 +123,7 @@ static void nilfs_dat_commit_free(struct inode *dat,
>> entry->de_start = cpu_to_le64(NILFS_CNO_MIN);
>> entry->de_end = cpu_to_le64(NILFS_CNO_MIN);
>> entry->de_blocknr = cpu_to_le64(0);
>> + entry->de_ss = cpu_to_le64(0);
>> kunmap_atomic(kaddr);
>>
>> nilfs_dat_commit_entry(dat, req);
>> @@ -201,6 +204,7 @@ void nilfs_dat_commit_end(struct inode *dat, struct nilfs_palloc_req *req,
>> WARN_ON(start > end);
>> }
>> entry->de_end = cpu_to_le64(end);
>> + entry->de_ss = cpu_to_le64(NILFS_CNO_MAX);
>> blocknr = le64_to_cpu(entry->de_blocknr);
>> kunmap_atomic(kaddr);
>>
>> @@ -365,6 +369,8 @@ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr)
>> }
>> WARN_ON(blocknr == 0);
>> entry->de_blocknr = cpu_to_le64(blocknr);
>> + if (entry->de_ss == cpu_to_le64(NILFS_CNO_MAX))
>> + entry->de_ss = cpu_to_le64(0);
>> kunmap_atomic(kaddr);
>>
>> mark_buffer_dirty(entry_bh);
>> @@ -430,6 +436,86 @@ int nilfs_dat_translate(struct inode *dat, __u64 vblocknr, sector_t *blocknrp)
>> return ret;
>> }
>>
>> +void nilfs_dat_do_scan_dec(struct inode *dat, struct nilfs_palloc_req *req,
>> + void *data)
>> +{
>> + struct nilfs_dat_entry *entry;
>> + __u64 start, end, prev_ss;
>> + __u64 *ssp = data, ss = ssp[0], prev = ssp[1], next = ssp[2];
>> + sector_t blocknr;
>> + void *kaddr;
>> + struct the_nilfs *nilfs;
>> +
>> + kaddr = kmap_atomic(req->pr_entry_bh->b_page);
>> + entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
>> + req->pr_entry_bh, kaddr);
>> + start = le64_to_cpu(entry->de_start);
>> + end = le64_to_cpu(entry->de_end);
>> + blocknr = le64_to_cpu(entry->de_blocknr);
>> + prev_ss = le64_to_cpu(entry->de_ss);
>> +
>> + if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end) {
>
> I think that it makes sense to use small functions with clear names
> about what we check.
Ok.
>> + if (prev_ss == ss || prev_ss == NILFS_CNO_MAX) {
>> + if (prev && prev >= start && prev < end)
>> + entry->de_ss = cpu_to_le64(prev);
>> + else if (next && next >= start && next < end)
>> + entry->de_ss = cpu_to_le64(next);
>> + else
>> + entry->de_ss = cpu_to_le64(0);
>
> Ditto.
>
>> +
>> + if (prev_ss != NILFS_CNO_MAX)
>> + prev_ss = le64_to_cpu(entry->de_ss);
>> + kunmap_atomic(kaddr);
>> + mark_buffer_dirty(req->pr_entry_bh);
>> + nilfs_mdt_mark_dirty(dat);
>> + } else
>> + kunmap_atomic(kaddr);
>> +
>> + if (prev_ss == 0) {
>> + nilfs = dat->i_sb->s_fs_info;
>> + nilfs_sufile_add_segment_usage(nilfs->ns_sufile,
>> + nilfs_get_segnum_of_block(nilfs, blocknr),
>> + -1, 0);
>> + }
>> + } else
>> + kunmap_atomic(kaddr);
>> +}
>> +
>> +void nilfs_dat_do_scan_inc(struct inode *dat, struct nilfs_palloc_req *req,
>> + void *data)
>> +{
>> + struct nilfs_dat_entry *entry;
>> + __u64 start, end, prev_ss;
>> + __u64 *ssp = data, ss = *ssp;
>> + sector_t blocknr;
>> + void *kaddr;
>> + struct the_nilfs *nilfs;
>> +
>> + kaddr = kmap_atomic(req->pr_entry_bh->b_page);
>> + entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
>> + req->pr_entry_bh, kaddr);
>> + start = le64_to_cpu(entry->de_start);
>> + end = le64_to_cpu(entry->de_end);
>> + blocknr = le64_to_cpu(entry->de_blocknr);
>> + prev_ss = le64_to_cpu(entry->de_ss);
>> +
>> + if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end &&
>> + (prev_ss == 0 || prev_ss == NILFS_CNO_MAX)) {
>
> Ditto. Moreover, you repeat this check.
What do you mean? Where do I repeat this check?
>> +
>> + entry->de_ss = cpu_to_le64(ss);
>> +
>> + kunmap_atomic(kaddr);
>> + mark_buffer_dirty(req->pr_entry_bh);
>> + nilfs_mdt_mark_dirty(dat);
>> +
>> + nilfs = dat->i_sb->s_fs_info;
>> + nilfs_sufile_add_segment_usage(nilfs->ns_sufile,
>> + nilfs_get_segnum_of_block(nilfs, blocknr),
>
> Looks weird. Maybe, variable?
Ok.
Thanks for your review so far.
Best regards,
Andreas Rohner
> Thanks,
> Vyacheslav Dubeyko.
>
>> + 1, 0);
>> + } else
>> + kunmap_atomic(kaddr);
>> +}
>> +
>> ssize_t nilfs_dat_get_vinfo(struct inode *dat, void *buf, unsigned visz,
>> size_t nvi)
>> {
>> diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
>> index cbd8e97..92a187e 100644
>> --- a/fs/nilfs2/dat.h
>> +++ b/fs/nilfs2/dat.h
>> @@ -55,5 +55,31 @@ ssize_t nilfs_dat_get_vinfo(struct inode *, void *, unsigned, size_t);
>>
>> int nilfs_dat_read(struct super_block *sb, size_t entry_size,
>> struct nilfs_inode *raw_inode, struct inode **inodep);
>> +void nilfs_dat_do_scan_dec(struct inode *, struct nilfs_palloc_req *, void *);
>> +void nilfs_dat_do_scan_inc(struct inode *, struct nilfs_palloc_req *, void *);
>> +
>> +/**
>> + * nilfs_dat_scan_dec_ss - scan all dat entries for a checkpoint dec suinfo
>> + * @dat: inode of dat file
>> + * @cno: snapshot number
>> + * @prev: previous snapshot number
>> + * @next: next snapshot number
>> + */
>> +static inline int nilfs_dat_scan_dec_ss(struct inode *dat, __u64 cno,
>> + __u64 prev, __u64 next)
>> +{
>> + __u64 data[3] = { cno, prev, next };
>> + return nilfs_palloc_scan_entries(dat, nilfs_dat_do_scan_dec, data);
>> +}
>> +
>> +/**
>> + * nilfs_dat_scan_dec_ss - scan all dat entries for a checkpoint inc suinfo
>> + * @dat: inode of dat file
>> + * @cno: snapshot number
>> + */
>> +static inline int nilfs_dat_scan_inc_ss(struct inode *dat, __u64 cno)
>> +{
>> + return nilfs_palloc_scan_entries(dat, nilfs_dat_do_scan_inc, &cno);
>> +}
>>
>> #endif /* _NILFS_DAT_H */
>> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
>> index ca269ad..ba9ebe02 100644
>> --- a/include/linux/nilfs2_fs.h
>> +++ b/include/linux/nilfs2_fs.h
>> @@ -475,13 +475,13 @@ struct nilfs_palloc_group_desc {
>> * @de_blocknr: block number
>> * @de_start: start checkpoint number
>> * @de_end: end checkpoint number
>> - * @de_rsv: reserved for future use
>> + * @de_ss: one of the snapshots the block belongs to
>> */
>> struct nilfs_dat_entry {
>> __le64 de_blocknr;
>> __le64 de_start;
>> __le64 de_end;
>> - __le64 de_rsv;
>> + __le64 de_ss;
>> };
>>
>> #define NILFS_MIN_DAT_ENTRY_SIZE 32
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 3/6] nilfs2: scan dat entries at snapshot creation/deletion time
[not found] ` <5326C1E5.10108-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-17 9:54 ` Vyacheslav Dubeyko
0 siblings, 0 replies; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-17 9:54 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Mon, 2014-03-17 at 10:35 +0100, Andreas Rohner wrote:
> >> +
> >> + if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end &&
> >> + (prev_ss == 0 || prev_ss == NILFS_CNO_MAX)) {
> >
> > Ditto. Moreover, you repeat this check.
>
> What do you mean? Where do I repeat this check?
>
I mean this check above:
>> + if (blocknr != 0 && end != NILFS_CNO_MAX && ss >= start && ss < end) {
Thanks,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries
[not found] ` <be7d3bd13015117222aac43194c0fdb9c5d0046f.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-17 13:19 ` Vyacheslav Dubeyko
2014-03-17 13:49 ` Andreas Rohner
0 siblings, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-17 13:19 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
> This patch introduces new flags for nilfs_vdesc to indicate the reason a
> block is alive. So if the block would be reclaimable, but must be
> treated as if it were alive, because it is part of a snapshot, then the
> snapshot flag is set.
>
I suppose that I don't quite follow your idea. As far as I can judge,
every block in DAT file has: (1) de_start: start checkpoint number; (2)
de_end: end checkpoint number. So, while one of checkpoint number is
snapshot number then we know that this block lives in snapshot. Am I
correct? Why do we need in special flags?
> Additionally a new ioctl() is added, which enables the userspace GC to
> perform a cleanup operation after setting the number of blocks with
> NILFS_IOCTL_SET_SUINFO. It sets DAT entries with de_ss values of
> NILFS_CNO_MAX to 0. NILFS_CNO_MAX indicates, that the corresponding
> block belongs to some snapshot, but was already decremented by a
> previous deletion operation. If the segment usage info is changed with
> NILFS_IOCTL_SET_SUINFO and the number of blocks is updated, then these
> blocks would never be decremented and there are scenarios where the
> corresponding segments would starve (never be cleaned). To prevent that
> they must be reset to 0.
>
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> ---
> fs/nilfs2/dat.c | 63 ++++++++++++++++++++++++++++
> fs/nilfs2/dat.h | 1 +
> fs/nilfs2/ioctl.c | 103 +++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/nilfs2_fs.h | 52 ++++++++++++++++++++++-
> 4 files changed, 216 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
> index 89a4a5f..7adb15d 100644
> --- a/fs/nilfs2/dat.c
> +++ b/fs/nilfs2/dat.c
> @@ -382,6 +382,69 @@ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr)
> }
>
> /**
> + * nilfs_dat_clean_snapshot_flag - check flags used by snapshots
> + * @dat: DAT file inode
> + * @vblocknr: virtual block number
> + *
> + * Description: nilfs_dat_clean_snapshot_flag() changes the flags from
> + * NILFS_CNO_MAX to 0 if necessary, so that segment usage is accurately
> + * counted. NILFS_CNO_MAX indicates, that the corresponding block belongs
> + * to some snapshot, but was already decremented. If the segment usage info
> + * is changed with NILFS_IOCTL_SET_SUINFO and the number of blocks is updated,
> + * then these blocks would never be decremented and there are scenarios where
> + * the corresponding segments would starve (never be cleaned).
> + *
> + * Return Value: On success, 0 is returned. On error, one of the following
> + * negative error codes is returned.
> + *
> + * %-EIO - I/O error.
> + *
> + * %-ENOMEM - Insufficient amount of memory available.
> + */
> +int nilfs_dat_clean_snapshot_flag(struct inode *dat, __u64 vblocknr)
Sounds likewise we clear flag. It can be confusing name.
> +{
> + struct buffer_head *entry_bh;
> + struct nilfs_dat_entry *entry;
> + void *kaddr;
> + int ret;
> +
> + ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh);
> + if (ret < 0)
> + return ret;
> +
> + /*
> + * The given disk block number (blocknr) is not yet written to
> + * the device at this point.
> + *
> + * To prevent nilfs_dat_translate() from returning the
> + * uncommitted block number, this makes a copy of the entry
> + * buffer and redirects nilfs_dat_translate() to the copy.
> + */
> + if (!buffer_nilfs_redirected(entry_bh)) {
> + ret = nilfs_mdt_freeze_buffer(dat, entry_bh);
> + if (ret) {
> + brelse(entry_bh);
> + return ret;
> + }
> + }
> +
> + kaddr = kmap_atomic(entry_bh->b_page);
> + entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr);
> + if (entry->de_ss == cpu_to_le64(NILFS_CNO_MAX)) {
> + entry->de_ss = cpu_to_le64(0);
> + kunmap_atomic(kaddr);
> + mark_buffer_dirty(entry_bh);
> + nilfs_mdt_mark_dirty(dat);
> + } else {
> + kunmap_atomic(kaddr);
> + }
Brackets are unnecessary here.
> +
> + brelse(entry_bh);
> +
> + return 0;
> +}
> +
> +/**
> * nilfs_dat_translate - translate a virtual block number to a block number
> * @dat: DAT file inode
> * @vblocknr: virtual block number
> diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
> index 92a187e..a528024 100644
> --- a/fs/nilfs2/dat.h
> +++ b/fs/nilfs2/dat.h
> @@ -51,6 +51,7 @@ void nilfs_dat_abort_update(struct inode *, struct nilfs_palloc_req *,
> int nilfs_dat_mark_dirty(struct inode *, __u64);
> int nilfs_dat_freev(struct inode *, __u64 *, size_t);
> int nilfs_dat_move(struct inode *, __u64, sector_t);
> +int nilfs_dat_clean_snapshot_flag(struct inode *, __u64);
> ssize_t nilfs_dat_get_vinfo(struct inode *, void *, unsigned, size_t);
>
> int nilfs_dat_read(struct super_block *sb, size_t entry_size,
> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index 422fb54..0b62bf4 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -578,7 +578,7 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
> struct buffer_head *bh;
> int ret;
>
> - if (vdesc->vd_flags == 0)
> + if (nilfs_vdesc_data(vdesc))
> ret = nilfs_gccache_submit_read_data(
> inode, vdesc->vd_offset, vdesc->vd_blocknr,
> vdesc->vd_vblocknr, &bh);
> @@ -662,6 +662,14 @@ static int nilfs_ioctl_move_blocks(struct super_block *sb,
> }
>
> do {
> + /*
> + * old user space tools to not initialize vd_flags2
> + * check if it contains invalid flags
> + */
> + if (vdesc->vd_flags2 &
"vd_flags2" is really bad naming. Completely obscure.
> + (~0UL << __NR_NILFS_VDESC_FIELDS))
Looks weird.
> + vdesc->vd_flags2 = 0;
> +
> ret = nilfs_ioctl_move_inode_block(inode, vdesc,
> &buffers);
> if (unlikely(ret < 0)) {
> @@ -984,6 +992,96 @@ out:
> }
>
> /**
> + * nilfs_ioctl_clean_snapshot_flags - clean dat entries with invalid de_ss
Ditto. Sounds likewise clearing of flag.
> + * @inode: inode object
> + * @filp: file object
> + * @cmd: ioctl's request code
> + * @argp: pointer on argument from userspace
> + *
> + * Description: nilfs_ioctl_clean_snapshot_flags() sets DAT entries with de_ss
> + * values of NILFS_CNO_MAX to 0. NILFS_CNO_MAX indicates, that the
> + * corresponding block belongs to some snapshot, but was already decremented.
> + * If the segment usage info is changed with NILFS_IOCTL_SET_SUINFO and the
> + * number of blocks is updated, then these blocks would never be decremented
> + * and there are scenarios where the corresponding segments would starve (never
> + * be cleaned).
> + *
> + * Return Value: On success, 0 is returned or error code, otherwise.
> + */
> +static int nilfs_ioctl_clean_snapshot_flags(struct inode *inode,
> + struct file *filp,
> + unsigned int cmd,
> + void __user *argp)
> +{
> + struct the_nilfs *nilfs = inode->i_sb->s_fs_info;
> + struct nilfs_transaction_info ti;
> + struct nilfs_argv argv;
> + struct nilfs_vdesc *vdesc;
> + size_t len, i;
> + void __user *base;
> + void *kbuf;
> + int ret;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + ret = mnt_want_write_file(filp);
> + if (ret)
> + return ret;
> +
> + ret = -EFAULT;
> + if (copy_from_user(&argv, argp, sizeof(struct nilfs_argv)))
> + goto out;
> +
> + ret = -EINVAL;
> + if (argv.v_size != sizeof(struct nilfs_vdesc))
> + goto out;
> + if (argv.v_nmembs > UINT_MAX / sizeof(struct nilfs_vdesc))
> + goto out;
> +
> + len = argv.v_size * argv.v_nmembs;
> + if (!len) {
> + ret = 0;
> + goto out;
> + }
> +
> + base = (void __user *)(unsigned long)argv.v_base;
> + kbuf = vmalloc(len);
> + if (!kbuf) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + if (copy_from_user(kbuf, base, len)) {
> + ret = -EFAULT;
> + goto out_free;
> + }
> +
> + ret = nilfs_transaction_begin(inode->i_sb, &ti, 0);
> + if (unlikely(ret))
> + goto out_free;
> +
> + for (i = 0, vdesc = kbuf; i < argv.v_nmembs; ++i, ++vdesc) {
> + if (nilfs_vdesc_snapshot(vdesc)) {
> + ret = nilfs_dat_clean_snapshot_flag(nilfs->ns_dat,
> + vdesc->vd_vblocknr);
> + if (ret) {
> + nilfs_transaction_abort(inode->i_sb);
> + goto out_free;
> + }
> + }
> + }
> +
> + nilfs_transaction_commit(inode->i_sb);
> +
> +out_free:
> + vfree(kbuf);
> +out:
> + mnt_drop_write_file(filp);
> + return ret;
> +}
> +
> +/**
> * nilfs_ioctl_sync - make a checkpoint
> * @inode: inode object
> * @filp: file object
> @@ -1332,6 +1430,8 @@ long nilfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return nilfs_ioctl_get_bdescs(inode, filp, cmd, argp);
> case NILFS_IOCTL_CLEAN_SEGMENTS:
> return nilfs_ioctl_clean_segments(inode, filp, cmd, argp);
> + case NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS:
> + return nilfs_ioctl_clean_snapshot_flags(inode, filp, cmd, argp);
> case NILFS_IOCTL_SYNC:
> return nilfs_ioctl_sync(inode, filp, cmd, argp);
> case NILFS_IOCTL_RESIZE:
> @@ -1368,6 +1468,7 @@ long nilfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> case NILFS_IOCTL_GET_VINFO:
> case NILFS_IOCTL_GET_BDESCS:
> case NILFS_IOCTL_CLEAN_SEGMENTS:
> + case NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS:
Sounds for me that we clean all snapshot's flags.
> case NILFS_IOCTL_SYNC:
> case NILFS_IOCTL_RESIZE:
> case NILFS_IOCTL_SET_ALLOC_RANGE:
> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
> index ba9ebe02..30ddc86 100644
> --- a/include/linux/nilfs2_fs.h
> +++ b/include/linux/nilfs2_fs.h
> @@ -863,7 +863,7 @@ struct nilfs_vinfo {
> * @vd_blocknr: disk block number
> * @vd_offset: logical block offset inside a file
> * @vd_flags: flags (data or node block)
> - * @vd_pad: padding
> + * @vd_flags2: additional flags
Ditto. Weird name.
> */
> struct nilfs_vdesc {
> __u64 vd_ino;
> @@ -873,9 +873,55 @@ struct nilfs_vdesc {
> __u64 vd_blocknr;
> __u64 vd_offset;
> __u32 vd_flags;
> - __u32 vd_pad;
> + /* vd_flags2 needed because of backwards compatibility */
Completely, misunderstand comment. Usually, it keeps old fields for
backward compatibility. But this flag is new.
> + __u32 vd_flags2;
> };
>
> +/* vdesc flags */
To be honest, I misunderstand why such number of flags and why namely
such flags? Comments are really necessary.
> +enum {
> + NILFS_VDESC_DATA,
> + NILFS_VDESC_NODE,
> + /* ... */
What does it mean?
> +};
> +enum {
> + NILFS_VDESC_SNAPSHOT,
> + __NR_NILFS_VDESC_FIELDS,
> + /* ... */
What does it mean?
Thanks,
Vyacheslav Dubeyko.
> +};
> +
> +#define NILFS_VDESC_FNS(flag, name) \
> +static inline void \
> +nilfs_vdesc_set_##name(struct nilfs_vdesc *vdesc) \
> +{ \
> + vdesc->vd_flags = NILFS_VDESC_##flag; \
> +} \
> +static inline int \
> +nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
> +{ \
> + return vdesc->vd_flags == NILFS_VDESC_##flag; \
> +}
> +
> +#define NILFS_VDESC_FNS2(flag, name) \
> +static inline void \
> +nilfs_vdesc_set_##name(struct nilfs_vdesc *vdesc) \
> +{ \
> + vdesc->vd_flags2 |= (1UL << NILFS_VDESC_##flag); \
> +} \
> +static inline void \
> +nilfs_vdesc_clear_##name(struct nilfs_vdesc *vdesc) \
> +{ \
> + vdesc->vd_flags2 &= ~(1UL << NILFS_VDESC_##flag); \
> +} \
> +static inline int \
> +nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
> +{ \
> + return !!(vdesc->vd_flags2 & (1UL << NILFS_VDESC_##flag)); \
> +}
> +
> +NILFS_VDESC_FNS(DATA, data)
> +NILFS_VDESC_FNS(NODE, node)
> +NILFS_VDESC_FNS2(SNAPSHOT, snapshot)
> +
> /**
> * struct nilfs_bdesc - descriptor of disk block number
> * @bd_ino: inode number
> @@ -922,5 +968,7 @@ struct nilfs_bdesc {
> _IOW(NILFS_IOCTL_IDENT, 0x8C, __u64[2])
> #define NILFS_IOCTL_SET_SUINFO \
> _IOW(NILFS_IOCTL_IDENT, 0x8D, struct nilfs_argv)
> +#define NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS \
> + _IOW(NILFS_IOCTL_IDENT, 0x8F, struct nilfs_argv)
>
> #endif /* _LINUX_NILFS_FS_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries
2014-03-17 13:19 ` Vyacheslav Dubeyko
@ 2014-03-17 13:49 ` Andreas Rohner
[not found] ` <5326FD51.7000209-hi6Y0CQ0nG0@public.gmane.org>
0 siblings, 1 reply; 34+ messages in thread
From: Andreas Rohner @ 2014-03-17 13:49 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-17 14:19, Vyacheslav Dubeyko wrote:
> On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
>> This patch introduces new flags for nilfs_vdesc to indicate the reason a
>> block is alive. So if the block would be reclaimable, but must be
>> treated as if it were alive, because it is part of a snapshot, then the
>> snapshot flag is set.
>>
>
> I suppose that I don't quite follow your idea. As far as I can judge,
> every block in DAT file has: (1) de_start: start checkpoint number; (2)
> de_end: end checkpoint number. So, while one of checkpoint number is
> snapshot number then we know that this block lives in snapshot. Am I
> correct? Why do we need in special flags?
Yes, but a snapshot can also be in between de_start and de_end. So to
check it you would have to get a list of all snapshots and look if one
of them is within the range of de_start to de_end. The userspace tools
already do this. The flags in nilfs_vdesc are there so that I don't have
to check it again in the kernel.
>> Additionally a new ioctl() is added, which enables the userspace GC to
>> perform a cleanup operation after setting the number of blocks with
>> NILFS_IOCTL_SET_SUINFO. It sets DAT entries with de_ss values of
>> NILFS_CNO_MAX to 0. NILFS_CNO_MAX indicates, that the corresponding
>> block belongs to some snapshot, but was already decremented by a
>> previous deletion operation. If the segment usage info is changed with
>> NILFS_IOCTL_SET_SUINFO and the number of blocks is updated, then these
>> blocks would never be decremented and there are scenarios where the
>> corresponding segments would starve (never be cleaned). To prevent that
>> they must be reset to 0.
>>
>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>> ---
>> fs/nilfs2/dat.c | 63 ++++++++++++++++++++++++++++
>> fs/nilfs2/dat.h | 1 +
>> fs/nilfs2/ioctl.c | 103 +++++++++++++++++++++++++++++++++++++++++++++-
>> include/linux/nilfs2_fs.h | 52 ++++++++++++++++++++++-
>> 4 files changed, 216 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
>> index 89a4a5f..7adb15d 100644
>> --- a/fs/nilfs2/dat.c
>> +++ b/fs/nilfs2/dat.c
>> @@ -382,6 +382,69 @@ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr)
>> }
>>
>> /**
>> + * nilfs_dat_clean_snapshot_flag - check flags used by snapshots
>> + * @dat: DAT file inode
>> + * @vblocknr: virtual block number
>> + *
>> + * Description: nilfs_dat_clean_snapshot_flag() changes the flags from
>> + * NILFS_CNO_MAX to 0 if necessary, so that segment usage is accurately
>> + * counted. NILFS_CNO_MAX indicates, that the corresponding block belongs
>> + * to some snapshot, but was already decremented. If the segment usage info
>> + * is changed with NILFS_IOCTL_SET_SUINFO and the number of blocks is updated,
>> + * then these blocks would never be decremented and there are scenarios where
>> + * the corresponding segments would starve (never be cleaned).
>> + *
>> + * Return Value: On success, 0 is returned. On error, one of the following
>> + * negative error codes is returned.
>> + *
>> + * %-EIO - I/O error.
>> + *
>> + * %-ENOMEM - Insufficient amount of memory available.
>> + */
>> +int nilfs_dat_clean_snapshot_flag(struct inode *dat, __u64 vblocknr)
>
> Sounds likewise we clear flag. It can be confusing name.
Yes it is hard to get a good name for that function. It has nothing to
do with the nilfs_vdesc flags.
>> +{
>> + struct buffer_head *entry_bh;
>> + struct nilfs_dat_entry *entry;
>> + void *kaddr;
>> + int ret;
>> +
>> + ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh);
>> + if (ret < 0)
>> + return ret;
>> +
>> + /*
>> + * The given disk block number (blocknr) is not yet written to
>> + * the device at this point.
>> + *
>> + * To prevent nilfs_dat_translate() from returning the
>> + * uncommitted block number, this makes a copy of the entry
>> + * buffer and redirects nilfs_dat_translate() to the copy.
>> + */
>> + if (!buffer_nilfs_redirected(entry_bh)) {
>> + ret = nilfs_mdt_freeze_buffer(dat, entry_bh);
>> + if (ret) {
>> + brelse(entry_bh);
>> + return ret;
>> + }
>> + }
>> +
>> + kaddr = kmap_atomic(entry_bh->b_page);
>> + entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr);
>> + if (entry->de_ss == cpu_to_le64(NILFS_CNO_MAX)) {
>> + entry->de_ss = cpu_to_le64(0);
>> + kunmap_atomic(kaddr);
>> + mark_buffer_dirty(entry_bh);
>> + nilfs_mdt_mark_dirty(dat);
>> + } else {
>> + kunmap_atomic(kaddr);
>> + }
>
> Brackets are unnecessary here.
Yes.
>> +
>> + brelse(entry_bh);
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> * nilfs_dat_translate - translate a virtual block number to a block number
>> * @dat: DAT file inode
>> * @vblocknr: virtual block number
>> diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
>> index 92a187e..a528024 100644
>> --- a/fs/nilfs2/dat.h
>> +++ b/fs/nilfs2/dat.h
>> @@ -51,6 +51,7 @@ void nilfs_dat_abort_update(struct inode *, struct nilfs_palloc_req *,
>> int nilfs_dat_mark_dirty(struct inode *, __u64);
>> int nilfs_dat_freev(struct inode *, __u64 *, size_t);
>> int nilfs_dat_move(struct inode *, __u64, sector_t);
>> +int nilfs_dat_clean_snapshot_flag(struct inode *, __u64);
>> ssize_t nilfs_dat_get_vinfo(struct inode *, void *, unsigned, size_t);
>>
>> int nilfs_dat_read(struct super_block *sb, size_t entry_size,
>> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
>> index 422fb54..0b62bf4 100644
>> --- a/fs/nilfs2/ioctl.c
>> +++ b/fs/nilfs2/ioctl.c
>> @@ -578,7 +578,7 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
>> struct buffer_head *bh;
>> int ret;
>>
>> - if (vdesc->vd_flags == 0)
>> + if (nilfs_vdesc_data(vdesc))
>> ret = nilfs_gccache_submit_read_data(
>> inode, vdesc->vd_offset, vdesc->vd_blocknr,
>> vdesc->vd_vblocknr, &bh);
>> @@ -662,6 +662,14 @@ static int nilfs_ioctl_move_blocks(struct super_block *sb,
>> }
>>
>> do {
>> + /*
>> + * old user space tools to not initialize vd_flags2
>> + * check if it contains invalid flags
>> + */
>> + if (vdesc->vd_flags2 &
>
> "vd_flags2" is really bad naming. Completely obscure.
I would like to use the already existing field vd_flags, but that is
impossible because of backwards compatibility.
>> + (~0UL << __NR_NILFS_VDESC_FIELDS))
>
> Looks weird.
>
>> + vdesc->vd_flags2 = 0;
>> +
>> ret = nilfs_ioctl_move_inode_block(inode, vdesc,
>> &buffers);
>> if (unlikely(ret < 0)) {
>> @@ -984,6 +992,96 @@ out:
>> }
>>
>> /**
>> + * nilfs_ioctl_clean_snapshot_flags - clean dat entries with invalid de_ss
>
> Ditto. Sounds likewise clearing of flag.
>
>> + * @inode: inode object
>> + * @filp: file object
>> + * @cmd: ioctl's request code
>> + * @argp: pointer on argument from userspace
>> + *
>> + * Description: nilfs_ioctl_clean_snapshot_flags() sets DAT entries with de_ss
>> + * values of NILFS_CNO_MAX to 0. NILFS_CNO_MAX indicates, that the
>> + * corresponding block belongs to some snapshot, but was already decremented.
>> + * If the segment usage info is changed with NILFS_IOCTL_SET_SUINFO and the
>> + * number of blocks is updated, then these blocks would never be decremented
>> + * and there are scenarios where the corresponding segments would starve (never
>> + * be cleaned).
>> + *
>> + * Return Value: On success, 0 is returned or error code, otherwise.
>> + */
>> +static int nilfs_ioctl_clean_snapshot_flags(struct inode *inode,
>> + struct file *filp,
>> + unsigned int cmd,
>> + void __user *argp)
>> +{
>> + struct the_nilfs *nilfs = inode->i_sb->s_fs_info;
>> + struct nilfs_transaction_info ti;
>> + struct nilfs_argv argv;
>> + struct nilfs_vdesc *vdesc;
>> + size_t len, i;
>> + void __user *base;
>> + void *kbuf;
>> + int ret;
>> +
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + ret = mnt_want_write_file(filp);
>> + if (ret)
>> + return ret;
>> +
>> + ret = -EFAULT;
>> + if (copy_from_user(&argv, argp, sizeof(struct nilfs_argv)))
>> + goto out;
>> +
>> + ret = -EINVAL;
>> + if (argv.v_size != sizeof(struct nilfs_vdesc))
>> + goto out;
>> + if (argv.v_nmembs > UINT_MAX / sizeof(struct nilfs_vdesc))
>> + goto out;
>> +
>> + len = argv.v_size * argv.v_nmembs;
>> + if (!len) {
>> + ret = 0;
>> + goto out;
>> + }
>> +
>> + base = (void __user *)(unsigned long)argv.v_base;
>> + kbuf = vmalloc(len);
>> + if (!kbuf) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> +
>> + if (copy_from_user(kbuf, base, len)) {
>> + ret = -EFAULT;
>> + goto out_free;
>> + }
>> +
>> + ret = nilfs_transaction_begin(inode->i_sb, &ti, 0);
>> + if (unlikely(ret))
>> + goto out_free;
>> +
>> + for (i = 0, vdesc = kbuf; i < argv.v_nmembs; ++i, ++vdesc) {
>> + if (nilfs_vdesc_snapshot(vdesc)) {
>> + ret = nilfs_dat_clean_snapshot_flag(nilfs->ns_dat,
>> + vdesc->vd_vblocknr);
>> + if (ret) {
>> + nilfs_transaction_abort(inode->i_sb);
>> + goto out_free;
>> + }
>> + }
>> + }
>> +
>> + nilfs_transaction_commit(inode->i_sb);
>> +
>> +out_free:
>> + vfree(kbuf);
>> +out:
>> + mnt_drop_write_file(filp);
>> + return ret;
>> +}
>> +
>> +/**
>> * nilfs_ioctl_sync - make a checkpoint
>> * @inode: inode object
>> * @filp: file object
>> @@ -1332,6 +1430,8 @@ long nilfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>> return nilfs_ioctl_get_bdescs(inode, filp, cmd, argp);
>> case NILFS_IOCTL_CLEAN_SEGMENTS:
>> return nilfs_ioctl_clean_segments(inode, filp, cmd, argp);
>> + case NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS:
>> + return nilfs_ioctl_clean_snapshot_flags(inode, filp, cmd, argp);
>> case NILFS_IOCTL_SYNC:
>> return nilfs_ioctl_sync(inode, filp, cmd, argp);
>> case NILFS_IOCTL_RESIZE:
>> @@ -1368,6 +1468,7 @@ long nilfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>> case NILFS_IOCTL_GET_VINFO:
>> case NILFS_IOCTL_GET_BDESCS:
>> case NILFS_IOCTL_CLEAN_SEGMENTS:
>> + case NILFS_IOCTL_CLEAN_SNAPSHOT_FLAGS:
>
> Sounds for me that we clean all snapshot's flags.
>
>> case NILFS_IOCTL_SYNC:
>> case NILFS_IOCTL_RESIZE:
>> case NILFS_IOCTL_SET_ALLOC_RANGE:
>> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
>> index ba9ebe02..30ddc86 100644
>> --- a/include/linux/nilfs2_fs.h
>> +++ b/include/linux/nilfs2_fs.h
>> @@ -863,7 +863,7 @@ struct nilfs_vinfo {
>> * @vd_blocknr: disk block number
>> * @vd_offset: logical block offset inside a file
>> * @vd_flags: flags (data or node block)
>> - * @vd_pad: padding
>> + * @vd_flags2: additional flags
>
> Ditto. Weird name.
>
>> */
>> struct nilfs_vdesc {
>> __u64 vd_ino;
>> @@ -873,9 +873,55 @@ struct nilfs_vdesc {
>> __u64 vd_blocknr;
>> __u64 vd_offset;
>> __u32 vd_flags;
>> - __u32 vd_pad;
>> + /* vd_flags2 needed because of backwards compatibility */
>
> Completely, misunderstand comment. Usually, it keeps old fields for
> backward compatibility. But this flag is new.
I will rewrite the comment. I need vd_flags2 because I can't use
vd_flags because of backwards compatibility.
>> + __u32 vd_flags2;
>> };
>>
>> +/* vdesc flags */
>
> To be honest, I misunderstand why such number of flags and why namely
> such flags? Comments are really necessary.
>
>> +enum {
>> + NILFS_VDESC_DATA,
>> + NILFS_VDESC_NODE,
>> + /* ... */
>
> What does it mean?
NILFS_VDESC_DATA = 0 and NILFS_VDESC_NODE = 1. This represents the type
of block. These two already existed, in the previous version, but they
were not explicit. See "[Patch 4/4] nilfs-utils: add extra flags to
nilfs_vdesc and update sui_nblocks":
@@ -148,17 +149,19 @@ static int nilfs_acc_blocks_file(struct nilfs_file
*file,
- vdesc->vd_flags = 0; /* data */
+ nilfs_vdesc_set_data(vdesc);
} else {
vdesc->vd_vblocknr =
le64_to_cpu(*(__le64 *)blk.b_binfo);
- vdesc->vd_flags = 1; /* node */
+ nilfs_vdesc_set_node(vdesc);
}
>> +};
>> +enum {
>> + NILFS_VDESC_SNAPSHOT,
>> + __NR_NILFS_VDESC_FIELDS,
>> + /* ... */
>
> What does it mean?
Those flags are set by the userspace tools in nilfs_vdesc_is_live().
They indicate the reason why a block is alive. NILFS_VDESC_SNAPSHOT
means, that the block is alive because it belongs to a snapshot.
nilfs_vdesc is a data structure for the communication between
kernelspace and userspace. You have to look at it in that context.
Best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries
[not found] ` <5326FD51.7000209-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-18 7:10 ` Vyacheslav Dubeyko
2014-03-18 8:38 ` Andreas Rohner
0 siblings, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-18 7:10 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Mon, 2014-03-17 at 14:49 +0100, Andreas Rohner wrote:
> >
> >> */
> >> struct nilfs_vdesc {
> >> __u64 vd_ino;
> >> @@ -873,9 +873,55 @@ struct nilfs_vdesc {
> >> __u64 vd_blocknr;
> >> __u64 vd_offset;
> >> __u32 vd_flags;
> >> - __u32 vd_pad;
> >> + /* vd_flags2 needed because of backwards compatibility */
> >
> > Completely, misunderstand comment. Usually, it keeps old fields for
> > backward compatibility. But this flag is new.
>
> I will rewrite the comment. I need vd_flags2 because I can't use
> vd_flags because of backwards compatibility.
>
> >> + __u32 vd_flags2;
What about vd_blk_state instead of vd_flags2?
> >> };
> >>
> >> +/* vdesc flags */
> >
> > To be honest, I misunderstand why such number of flags and why namely
> > such flags? Comments are really necessary.
> >
> >> +enum {
> >> + NILFS_VDESC_DATA,
> >> + NILFS_VDESC_NODE,
> >> + /* ... */
> >
> > What does it mean?
>
> NILFS_VDESC_DATA = 0 and NILFS_VDESC_NODE = 1. This represents the type
> of block. These two already existed, in the previous version, but they
> were not explicit. See "[Patch 4/4] nilfs-utils: add extra flags to
> nilfs_vdesc and update sui_nblocks":
>
> @@ -148,17 +149,19 @@ static int nilfs_acc_blocks_file(struct nilfs_file
> *file,
> - vdesc->vd_flags = 0; /* data */
> + nilfs_vdesc_set_data(vdesc);
> } else {
> vdesc->vd_vblocknr =
> le64_to_cpu(*(__le64 *)blk.b_binfo);
> - vdesc->vd_flags = 1; /* node */
> + nilfs_vdesc_set_node(vdesc);
> }
>
> >> +};
> >> +enum {
> >> + NILFS_VDESC_SNAPSHOT,
> >> + __NR_NILFS_VDESC_FIELDS,
> >> + /* ... */
> >
> > What does it mean?
I asked here about strange comment. What does it mean?
Moreover, I slightly confused by NILFS_VDESC_SNAPSHOT. Is it bit-based
flag? I mean NILFS_VDESC_SNAPSHOT = (1 << 0). Or am I incorrect?
Thanks,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries
2014-03-18 7:10 ` Vyacheslav Dubeyko
@ 2014-03-18 8:38 ` Andreas Rohner
0 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-18 8:38 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-18 08:10, Vyacheslav Dubeyko wrote:
> On Mon, 2014-03-17 at 14:49 +0100, Andreas Rohner wrote:
>
>>>
>>>> */
>>>> struct nilfs_vdesc {
>>>> __u64 vd_ino;
>>>> @@ -873,9 +873,55 @@ struct nilfs_vdesc {
>>>> __u64 vd_blocknr;
>>>> __u64 vd_offset;
>>>> __u32 vd_flags;
>>>> - __u32 vd_pad;
>>>> + /* vd_flags2 needed because of backwards compatibility */
>>>
>>> Completely, misunderstand comment. Usually, it keeps old fields for
>>> backward compatibility. But this flag is new.
>>
>> I will rewrite the comment. I need vd_flags2 because I can't use
>> vd_flags because of backwards compatibility.
>>
>>>> + __u32 vd_flags2;
>
> What about vd_blk_state instead of vd_flags2?
Yes sounds good to me.
>>>> };
>>>>
>>>> +/* vdesc flags */
>>>
>>> To be honest, I misunderstand why such number of flags and why namely
>>> such flags? Comments are really necessary.
>>>
>>>> +enum {
>>>> + NILFS_VDESC_DATA,
>>>> + NILFS_VDESC_NODE,
>>>> + /* ... */
>>>
>>> What does it mean?
>>
>> NILFS_VDESC_DATA = 0 and NILFS_VDESC_NODE = 1. This represents the type
>> of block. These two already existed, in the previous version, but they
>> were not explicit. See "[Patch 4/4] nilfs-utils: add extra flags to
>> nilfs_vdesc and update sui_nblocks":
>>
>> @@ -148,17 +149,19 @@ static int nilfs_acc_blocks_file(struct nilfs_file
>> *file,
>> - vdesc->vd_flags = 0; /* data */
>> + nilfs_vdesc_set_data(vdesc);
>> } else {
>> vdesc->vd_vblocknr =
>> le64_to_cpu(*(__le64 *)blk.b_binfo);
>> - vdesc->vd_flags = 1; /* node */
>> + nilfs_vdesc_set_node(vdesc);
>> }
>>
>>>> +};
>>>> +enum {
>>>> + NILFS_VDESC_SNAPSHOT,
>>>> + __NR_NILFS_VDESC_FIELDS,
>>>> + /* ... */
>>>
>>> What does it mean?
>
> I asked here about strange comment. What does it mean?
Sorry for the misunderstanding. I copied the comment from other flags like:
enum {
NILFS_SEGMENT_USAGE_ACTIVE,
NILFS_SEGMENT_USAGE_DIRTY,
NILFS_SEGMENT_USAGE_ERROR,
/* ... */
};
I guess it means "additional flags come here".
But you are right it is confusing it should be like that:
enum {
NILFS_VDESC_SNAPSHOT,
NILFS_VDESC_PROTECTION_PERIOD,
/* ... */
__NR_NILFS_VDESC_FIELDS,
};
> Moreover, I slightly confused by NILFS_VDESC_SNAPSHOT. Is it bit-based
> flag? I mean NILFS_VDESC_SNAPSHOT = (1 << 0). Or am I incorrect?
Yes NILFS_VDESC_SNAPSHOT and NILFS_VDESC_PROTECTION_PERIOD are
bit-based. NILFS_VDESC_DATA and NILFS_VDESC_NODE are not bit-based
because of backwards compatibility.
Please also note, that [PATCH 5/6] adds another flag, namely
NILFS_VDESC_PROTECTION_PERIOD.
Best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 5/6] nilfs2: add counting of live blocks for blocks that are overwritten
[not found] ` <25dd8a8bb6943ffa3e0663848363135585a48109.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-03-18 11:50 ` Vyacheslav Dubeyko
2014-03-18 14:02 ` Andreas Rohner
0 siblings, 1 reply; 34+ messages in thread
From: Vyacheslav Dubeyko @ 2014-03-18 11:50 UTC (permalink / raw)
To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
> diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
> index 7adb15d..e7b19c40 100644
> --- a/fs/nilfs2/dat.c
> +++ b/fs/nilfs2/dat.c
> @@ -445,6 +445,64 @@ int nilfs_dat_clean_snapshot_flag(struct inode *dat, __u64 vblocknr)
> }
>
> /**
> + * nilfs_dat_is_live - checks if the virtual block number is alive
What about nilfs_dat_block_is_alive?
> + * @dat: DAT file inode
> + * @vblocknr: virtual block number
> + *
> + * Description: nilfs_dat_is_live() looks up the DAT entry for @vblocknr and
> + * determines if the corresponding block is alive or not. This check ignores
> + * snapshots and protection periods.
> + *
> + * Return Value: 1 if vblocknr is alive and 0 otherwise. On error, one
> + * of the following negative error codes is returned.
It is really bad idea to mess error codes and info return, from my point
of view. Usually, it results in very buggy code in the place of call.
Actually, you use binary nature of returned value.
I think that it needs to rework ideology of this function. Maybe, it
needs to return bool and to return error value as argument.
> + *
> + * %-EIO - I/O error.
> + *
> + * %-ENOMEM - Insufficient amount of memory available.
> + *
> + * %-ENOENT - A block number associated with @vblocknr does not exist.
> + */
> +int nilfs_dat_is_live(struct inode *dat, __u64 vblocknr)
> +{
> + struct buffer_head *entry_bh, *bh;
> + struct nilfs_dat_entry *entry;
> + sector_t blocknr;
> + void *kaddr;
> + int ret;
> +
> + ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh);
> + if (ret < 0)
> + return ret;
> +
> + if (!nilfs_doing_gc() && buffer_nilfs_redirected(entry_bh)) {
> + bh = nilfs_mdt_get_frozen_buffer(dat, entry_bh);
> + if (bh) {
> + WARN_ON(!buffer_uptodate(bh));
> + brelse(entry_bh);
> + entry_bh = bh;
> + }
> + }
> +
> + kaddr = kmap_atomic(entry_bh->b_page);
> + entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr);
> + blocknr = le64_to_cpu(entry->de_blocknr);
> + if (blocknr == 0) {
I suppose that zero is specially named constant?
> + ret = -ENOENT;
> + goto out;
> + }
> +
> +
> + if (entry->de_end == cpu_to_le64(NILFS_CNO_MAX))
> + ret = 1;
> + else
> + ret = 0;
> +out:
> + kunmap_atomic(kaddr);
> + brelse(entry_bh);
> + return ret;
> +}
> +
> +/**
> * nilfs_dat_translate - translate a virtual block number to a block number
> * @dat: DAT file inode
> * @vblocknr: virtual block number
> diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
> index a528024..51d44c0 100644
> --- a/fs/nilfs2/dat.h
> +++ b/fs/nilfs2/dat.h
> @@ -31,6 +31,7 @@
> struct nilfs_palloc_req;
>
> int nilfs_dat_translate(struct inode *, __u64, sector_t *);
> +int nilfs_dat_is_live(struct inode *, __u64);
>
> int nilfs_dat_prepare_alloc(struct inode *, struct nilfs_palloc_req *);
> void nilfs_dat_commit_alloc(struct inode *, struct nilfs_palloc_req *);
> diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
> index b9c5726..c32b896 100644
> --- a/fs/nilfs2/inode.c
> +++ b/fs/nilfs2/inode.c
> @@ -86,6 +86,8 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
> int err = 0, ret;
> unsigned maxblocks = bh_result->b_size >> inode->i_blkbits;
>
> + bh_result->b_blocknr = 0;
> +
> down_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
> ret = nilfs_bmap_lookup_contig(ii->i_bmap, blkoff, &blknum, maxblocks);
> up_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index 0b62bf4..3603394 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -612,6 +612,12 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
> brelse(bh);
> return -EEXIST;
> }
> +
> + if (nilfs_vdesc_snapshot(vdesc))
> + set_buffer_nilfs_snapshot(bh);
> + if (nilfs_vdesc_protection_period(vdesc))
> + set_buffer_nilfs_protection_period(bh);
> +
> list_add_tail(&bh->b_assoc_buffers, buffers);
> return 0;
> }
> diff --git a/fs/nilfs2/page.h b/fs/nilfs2/page.h
> index ef30c5c..8c34a31 100644
> --- a/fs/nilfs2/page.h
> +++ b/fs/nilfs2/page.h
> @@ -36,13 +36,17 @@ enum {
> BH_NILFS_Volatile,
> BH_NILFS_Checked,
> BH_NILFS_Redirected,
> + BH_NILFS_Snapshot,
> + BH_NILFS_Protection_Period,
> };
>
> BUFFER_FNS(NILFS_Node, nilfs_node) /* nilfs node buffers */
> BUFFER_FNS(NILFS_Volatile, nilfs_volatile)
> BUFFER_FNS(NILFS_Checked, nilfs_checked) /* buffer is verified */
> BUFFER_FNS(NILFS_Redirected, nilfs_redirected) /* redirected to a copy */
> -
> +BUFFER_FNS(NILFS_Snapshot, nilfs_snapshot) /* belongs to a snapshot */
> +BUFFER_FNS(NILFS_Protection_Period, nilfs_protection_period) /* protected by
> + protection period */
>
> int __nilfs_clear_page_dirty(struct page *);
>
> diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
> index dc3a9efd..c72fc37 100644
> --- a/fs/nilfs2/segbuf.c
> +++ b/fs/nilfs2/segbuf.c
> @@ -28,6 +28,7 @@
> #include <linux/slab.h>
> #include "page.h"
> #include "segbuf.h"
> +#include "sufile.h"
>
>
> struct nilfs_write_info {
> @@ -57,6 +58,8 @@ struct nilfs_segment_buffer *nilfs_segbuf_new(struct super_block *sb)
> INIT_LIST_HEAD(&segbuf->sb_segsum_buffers);
> INIT_LIST_HEAD(&segbuf->sb_payload_buffers);
> segbuf->sb_super_root = NULL;
> + segbuf->sb_su_blocks = 0;
> + segbuf->sb_su_blocks_cancel = 0;
>
> init_completion(&segbuf->sb_bio_event);
> atomic_set(&segbuf->sb_err, 0);
> @@ -82,6 +85,25 @@ void nilfs_segbuf_map(struct nilfs_segment_buffer *segbuf, __u64 segnum,
> segbuf->sb_fseg_end - segbuf->sb_pseg_start + 1;
> }
>
> +int nilfs_segbuf_set_sui(struct nilfs_segment_buffer *segbuf,
> + struct the_nilfs *nilfs)
> +{
> + struct nilfs_suinfo si;
> + ssize_t err;
> +
> + err = nilfs_sufile_get_suinfo(nilfs->ns_sufile, segbuf->sb_segnum, &si,
> + sizeof(si), 1);
> + if (err != 1)
If nilfs_sufile_get_suinfo() returns error then how it can be equal by
one? What a mess?
> + return -1;
It is really bad idea. Finally, caller will have -EPERM. Do you mean
this here?
> +
> + if (si.sui_nblocks == 0)
> + si.sui_nblocks = segbuf->sb_pseg_start - segbuf->sb_fseg_start;
> +
> + segbuf->sb_su_blocks = si.sui_nblocks;
> + segbuf->sb_su_blocks_cancel = si.sui_nblocks;
> + return 0;
> +}
> +
> /**
> * nilfs_segbuf_map_cont - map a new log behind a given log
> * @segbuf: new segment buffer
> @@ -450,6 +472,9 @@ static int nilfs_segbuf_submit_bh(struct nilfs_segment_buffer *segbuf,
>
> len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh));
> if (len == bh->b_size) {
> + lock_buffer(bh);
> + map_bh(bh, segbuf->sb_super, wi->blocknr + wi->end);
> + unlock_buffer(bh);
> wi->end++;
> return 0;
> }
> diff --git a/fs/nilfs2/segbuf.h b/fs/nilfs2/segbuf.h
> index b04f08c..482bbad 100644
> --- a/fs/nilfs2/segbuf.h
> +++ b/fs/nilfs2/segbuf.h
> @@ -83,6 +83,8 @@ struct nilfs_segment_buffer {
> sector_t sb_fseg_start, sb_fseg_end;
> sector_t sb_pseg_start;
> unsigned sb_rest_blocks;
> + __u32 sb_su_blocks_cancel;
> + __s64 sb_su_blocks;
>
> /* Buffers */
> struct list_head sb_segsum_buffers;
> @@ -122,6 +124,8 @@ void nilfs_segbuf_map(struct nilfs_segment_buffer *, __u64, unsigned long,
> struct the_nilfs *);
> void nilfs_segbuf_map_cont(struct nilfs_segment_buffer *segbuf,
> struct nilfs_segment_buffer *prev);
> +int nilfs_segbuf_set_sui(struct nilfs_segment_buffer *segbuf,
> + struct the_nilfs *nilfs);
> void nilfs_segbuf_set_next_segnum(struct nilfs_segment_buffer *, __u64,
> struct the_nilfs *);
> int nilfs_segbuf_reset(struct nilfs_segment_buffer *, unsigned, time_t, __u64);
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index a1a1916..5d98a1c 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1257,6 +1257,10 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
> }
> nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
>
> + err = nilfs_segbuf_set_sui(segbuf, nilfs);
> + if (err)
> + goto failed;
> +
> BUG_ON(!list_empty(&sci->sc_segbufs));
> list_add_tail(&segbuf->sb_list, &sci->sc_segbufs);
> sci->sc_segbuf_nblocks = segbuf->sb_rest_blocks;
> @@ -1306,6 +1310,10 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
> segbuf->sb_sum.seg_seq = prev->sb_sum.seg_seq + 1;
> nilfs_segbuf_set_next_segnum(segbuf, nextnextnum, nilfs);
>
> + err = nilfs_segbuf_set_sui(segbuf, nilfs);
> + if (err)
> + goto failed;
> +
> list_add_tail(&segbuf->sb_list, &list);
> prev = segbuf;
> }
> @@ -1368,8 +1376,7 @@ static void nilfs_segctor_update_segusage(struct nilfs_sc_info *sci,
> int ret;
>
> list_for_each_entry(segbuf, &sci->sc_segbufs, sb_list) {
> - live_blocks = segbuf->sb_sum.nblocks +
> - (segbuf->sb_pseg_start - segbuf->sb_fseg_start);
> + live_blocks = segbuf->sb_sum.nfileblk + segbuf->sb_su_blocks;
> ret = nilfs_sufile_set_segment_usage(sufile, segbuf->sb_segnum,
> live_blocks,
> sci->sc_seg_ctime);
> @@ -1383,9 +1390,9 @@ static void nilfs_cancel_segusage(struct list_head *logs, struct inode *sufile)
> int ret;
>
> segbuf = NILFS_FIRST_SEGBUF(logs);
> +
> ret = nilfs_sufile_set_segment_usage(sufile, segbuf->sb_segnum,
> - segbuf->sb_pseg_start -
> - segbuf->sb_fseg_start, 0);
> + segbuf->sb_su_blocks_cancel, 0);
> WARN_ON(ret); /* always succeed because the segusage is dirty */
>
> list_for_each_entry_continue(segbuf, logs, sb_list) {
> @@ -1477,7 +1484,9 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
> struct nilfs_segment_buffer *segbuf,
> int mode)
> {
> + struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
> struct inode *inode = NULL;
> + struct nilfs_inode_info *ii;
> sector_t blocknr;
> unsigned long nfinfo = segbuf->sb_sum.nfinfo;
> unsigned long nblocks = 0, ndatablk = 0;
> @@ -1487,7 +1496,9 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
> union nilfs_binfo binfo;
> struct buffer_head *bh, *bh_org;
> ino_t ino = 0;
> - int err = 0;
> + int gc_inode = 0, err = 0;
> + __u64 segnum, prev_segnum = 0, dectime = 0, maxdectime = 0;
> + __u32 blkcount = 0;
>
> if (!nfinfo)
> goto out;
> @@ -1508,6 +1519,17 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
>
> inode = bh->b_page->mapping->host;
>
> + ii = NILFS_I(inode);
> + gc_inode = test_bit(NILFS_I_GCINODE, &ii->i_state);
> + dectime = sci->sc_seg_ctime;
The dectime sounds not very good for me.
> + /* no update of lastdec necessary */
> + if (ino == NILFS_DAT_INO || ino == NILFS_SUFILE_INO ||
> + ino == NILFS_CPFILE_INO)
> + dectime = 0;
What about such?
if (ino == NILFS_DAT_INO ||
ino == NILFS_SUFILE_INO ||
ino == NILFS_CPFILE_INO)
dectime = 0;
But really I prefer to see small check function (is_metadata_file(), for
example).
> +
> + if (dectime > maxdectime)
> + maxdectime = dectime;
> +
> if (mode == SC_LSEG_DSYNC)
> sc_op = &nilfs_sc_dsync_ops;
> else if (ino == NILFS_DAT_INO)
> @@ -1515,6 +1537,39 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
> else /* file blocks */
> sc_op = &nilfs_sc_file_ops;
> }
> +
> + segnum = nilfs_get_segnum_of_block(nilfs, bh->b_blocknr);
> + if (!gc_inode && bh->b_blocknr > 0 &&
> + (ino == NILFS_DAT_INO || !buffer_nilfs_node(bh)) &&
> + segnum < nilfs->ns_nsegments) {
> +
> + if (segnum != prev_segnum) {
> + if (blkcount) {
> + nilfs_sufile_add_segment_usage(
> + nilfs->ns_sufile,
> + prev_segnum,
> + -((__s64)blkcount),
> + maxdectime);
It is really bad code style. Usually, it means necessity to refactor
function's code. Otherwise, it is really hard to understand the code.
> + }
> + prev_segnum = segnum;
> + blkcount = 0;
> + maxdectime = dectime;
> + }
> +
> +
> + if (segnum == segbuf->sb_segnum)
> + segbuf->sb_su_blocks--;
> + else
> + ++blkcount;
> + } else if (gc_inode && bh->b_blocknr > 0) {
> + /* check again if gc blocks are alive */
> + if (!buffer_nilfs_snapshot(bh) &&
> + (buffer_nilfs_protection_period(bh) ||
> + !nilfs_dat_is_live(nilfs->ns_dat,
> + bh->b_blocknr)))
> + segbuf->sb_su_blocks--;
Ahhhhh. Again and again. :) Bad code style. You need to improve your
taste. :)
> + }
> +
> bh_org = bh;
> get_bh(bh_org);
> err = nilfs_bmap_assign(NILFS_I(inode)->i_bmap, &bh, blocknr,
> @@ -1538,6 +1593,10 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
> } else if (ndatablk > 0)
> ndatablk--;
> }
> +
> + if (blkcount)
> + nilfs_sufile_add_segment_usage(nilfs->ns_sufile, prev_segnum,
> + -((__s64)blkcount), maxdectime);
Such way -((__s64)blkcount) looks not very good. Very complex and
confusing construction at whole, from my viewpoint.
Thanks,
Vyacheslav Dubeyko.
> out:
> return 0;
>
> diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
> index 30ddc86..e05793a 100644
> --- a/include/linux/nilfs2_fs.h
> +++ b/include/linux/nilfs2_fs.h
> @@ -885,6 +885,7 @@ enum {
> };
> enum {
> NILFS_VDESC_SNAPSHOT,
> + NILFS_VDESC_PROTECTION_PERIOD,
> __NR_NILFS_VDESC_FIELDS,
> /* ... */
> };
> @@ -921,6 +922,7 @@ nilfs_vdesc_##name(const struct nilfs_vdesc *vdesc) \
> NILFS_VDESC_FNS(DATA, data)
> NILFS_VDESC_FNS(NODE, node)
> NILFS_VDESC_FNS2(SNAPSHOT, snapshot)
> +NILFS_VDESC_FNS2(PROTECTION_PERIOD, protection_period)
>
> /**
> * struct nilfs_bdesc - descriptor of disk block number
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 5/6] nilfs2: add counting of live blocks for blocks that are overwritten
2014-03-18 11:50 ` Vyacheslav Dubeyko
@ 2014-03-18 14:02 ` Andreas Rohner
0 siblings, 0 replies; 34+ messages in thread
From: Andreas Rohner @ 2014-03-18 14:02 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On 2014-03-18 12:50, Vyacheslav Dubeyko wrote:
> On Sun, 2014-03-16 at 11:47 +0100, Andreas Rohner wrote:
>
>> diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
>> index 7adb15d..e7b19c40 100644
>> --- a/fs/nilfs2/dat.c
>> +++ b/fs/nilfs2/dat.c
>> @@ -445,6 +445,64 @@ int nilfs_dat_clean_snapshot_flag(struct inode *dat, __u64 vblocknr)
>> }
>>
>> /**
>> + * nilfs_dat_is_live - checks if the virtual block number is alive
>
> What about nilfs_dat_block_is_alive?
Yes sounds good.
>> + * @dat: DAT file inode
>> + * @vblocknr: virtual block number
>> + *
>> + * Description: nilfs_dat_is_live() looks up the DAT entry for @vblocknr and
>> + * determines if the corresponding block is alive or not. This check ignores
>> + * snapshots and protection periods.
>> + *
>> + * Return Value: 1 if vblocknr is alive and 0 otherwise. On error, one
>> + * of the following negative error codes is returned.
>
> It is really bad idea to mess error codes and info return, from my point
> of view. Usually, it results in very buggy code in the place of call.
> Actually, you use binary nature of returned value.
>
> I think that it needs to rework ideology of this function. Maybe, it
> needs to return bool and to return error value as argument.
Yes that is true.
>> + *
>> + * %-EIO - I/O error.
>> + *
>> + * %-ENOMEM - Insufficient amount of memory available.
>> + *
>> + * %-ENOENT - A block number associated with @vblocknr does not exist.
>> + */
>> +int nilfs_dat_is_live(struct inode *dat, __u64 vblocknr)
>> +{
>> + struct buffer_head *entry_bh, *bh;
>> + struct nilfs_dat_entry *entry;
>> + sector_t blocknr;
>> + void *kaddr;
>> + int ret;
>> +
>> + ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh);
>> + if (ret < 0)
>> + return ret;
>> +
>> + if (!nilfs_doing_gc() && buffer_nilfs_redirected(entry_bh)) {
>> + bh = nilfs_mdt_get_frozen_buffer(dat, entry_bh);
>> + if (bh) {
>> + WARN_ON(!buffer_uptodate(bh));
>> + brelse(entry_bh);
>> + entry_bh = bh;
>> + }
>> + }
>> +
>> + kaddr = kmap_atomic(entry_bh->b_page);
>> + entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr);
>> + blocknr = le64_to_cpu(entry->de_blocknr);
>> + if (blocknr == 0) {
>
> I suppose that zero is specially named constant?
I copied that code from nilfs_dat_translate(). So it is not my fault
that there isn't a properly named constant ;)
>> + ret = -ENOENT;
>> + goto out;
>> + }
>> +
>> +
>> + if (entry->de_end == cpu_to_le64(NILFS_CNO_MAX))
>> + ret = 1;
>> + else
>> + ret = 0;
>> +out:
>> + kunmap_atomic(kaddr);
>> + brelse(entry_bh);
>> + return ret;
>> +}
>> +
>> +/**
>> * nilfs_dat_translate - translate a virtual block number to a block number
>> * @dat: DAT file inode
>> * @vblocknr: virtual block number
>> diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
>> index a528024..51d44c0 100644
>> --- a/fs/nilfs2/dat.h
>> +++ b/fs/nilfs2/dat.h
>> @@ -31,6 +31,7 @@
>> struct nilfs_palloc_req;
>>
>> int nilfs_dat_translate(struct inode *, __u64, sector_t *);
>> +int nilfs_dat_is_live(struct inode *, __u64);
>>
>> int nilfs_dat_prepare_alloc(struct inode *, struct nilfs_palloc_req *);
>> void nilfs_dat_commit_alloc(struct inode *, struct nilfs_palloc_req *);
>> diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
>> index b9c5726..c32b896 100644
>> --- a/fs/nilfs2/inode.c
>> +++ b/fs/nilfs2/inode.c
>> @@ -86,6 +86,8 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
>> int err = 0, ret;
>> unsigned maxblocks = bh_result->b_size >> inode->i_blkbits;
>>
>> + bh_result->b_blocknr = 0;
>> +
>> down_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
>> ret = nilfs_bmap_lookup_contig(ii->i_bmap, blkoff, &blknum, maxblocks);
>> up_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
>> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
>> index 0b62bf4..3603394 100644
>> --- a/fs/nilfs2/ioctl.c
>> +++ b/fs/nilfs2/ioctl.c
>> @@ -612,6 +612,12 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
>> brelse(bh);
>> return -EEXIST;
>> }
>> +
>> + if (nilfs_vdesc_snapshot(vdesc))
>> + set_buffer_nilfs_snapshot(bh);
>> + if (nilfs_vdesc_protection_period(vdesc))
>> + set_buffer_nilfs_protection_period(bh);
>> +
>> list_add_tail(&bh->b_assoc_buffers, buffers);
>> return 0;
>> }
>> diff --git a/fs/nilfs2/page.h b/fs/nilfs2/page.h
>> index ef30c5c..8c34a31 100644
>> --- a/fs/nilfs2/page.h
>> +++ b/fs/nilfs2/page.h
>> @@ -36,13 +36,17 @@ enum {
>> BH_NILFS_Volatile,
>> BH_NILFS_Checked,
>> BH_NILFS_Redirected,
>> + BH_NILFS_Snapshot,
>> + BH_NILFS_Protection_Period,
>> };
>>
>> BUFFER_FNS(NILFS_Node, nilfs_node) /* nilfs node buffers */
>> BUFFER_FNS(NILFS_Volatile, nilfs_volatile)
>> BUFFER_FNS(NILFS_Checked, nilfs_checked) /* buffer is verified */
>> BUFFER_FNS(NILFS_Redirected, nilfs_redirected) /* redirected to a copy */
>> -
>> +BUFFER_FNS(NILFS_Snapshot, nilfs_snapshot) /* belongs to a snapshot */
>> +BUFFER_FNS(NILFS_Protection_Period, nilfs_protection_period) /* protected by
>> + protection period */
>>
>> int __nilfs_clear_page_dirty(struct page *);
>>
>> diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
>> index dc3a9efd..c72fc37 100644
>> --- a/fs/nilfs2/segbuf.c
>> +++ b/fs/nilfs2/segbuf.c
>> @@ -28,6 +28,7 @@
>> #include <linux/slab.h>
>> #include "page.h"
>> #include "segbuf.h"
>> +#include "sufile.h"
>>
>>
>> struct nilfs_write_info {
>> @@ -57,6 +58,8 @@ struct nilfs_segment_buffer *nilfs_segbuf_new(struct super_block *sb)
>> INIT_LIST_HEAD(&segbuf->sb_segsum_buffers);
>> INIT_LIST_HEAD(&segbuf->sb_payload_buffers);
>> segbuf->sb_super_root = NULL;
>> + segbuf->sb_su_blocks = 0;
>> + segbuf->sb_su_blocks_cancel = 0;
>>
>> init_completion(&segbuf->sb_bio_event);
>> atomic_set(&segbuf->sb_err, 0);
>> @@ -82,6 +85,25 @@ void nilfs_segbuf_map(struct nilfs_segment_buffer *segbuf, __u64 segnum,
>> segbuf->sb_fseg_end - segbuf->sb_pseg_start + 1;
>> }
>>
>> +int nilfs_segbuf_set_sui(struct nilfs_segment_buffer *segbuf,
>> + struct the_nilfs *nilfs)
>> +{
>> + struct nilfs_suinfo si;
>> + ssize_t err;
>> +
>> + err = nilfs_sufile_get_suinfo(nilfs->ns_sufile, segbuf->sb_segnum, &si,
>> + sizeof(si), 1);
>> + if (err != 1)
>
> If nilfs_sufile_get_suinfo() returns error then how it can be equal by
> one? What a mess?
Actually nilfs_sufile_get_suinfo() returns the number of segments
written on success and a negative error otherwise. So it returns both an
error and the number of segments. Since I requested one entry I compare
it to 1.
>
>> + return -1;
>
> It is really bad idea. Finally, caller will have -EPERM. Do you mean
> this here?
Hmm yes that is wrong. It should probably be something like:
if (err < 0)
return err;
else if (err != 1)
return -ENOENT;
>> +
>> + if (si.sui_nblocks == 0)
>> + si.sui_nblocks = segbuf->sb_pseg_start - segbuf->sb_fseg_start;
>> +
>> + segbuf->sb_su_blocks = si.sui_nblocks;
>> + segbuf->sb_su_blocks_cancel = si.sui_nblocks;
>> + return 0;
>> +}
>> +
>> /**
>> * nilfs_segbuf_map_cont - map a new log behind a given log
>> * @segbuf: new segment buffer
>> @@ -450,6 +472,9 @@ static int nilfs_segbuf_submit_bh(struct nilfs_segment_buffer *segbuf,
>>
>> len = bio_add_page(wi->bio, bh->b_page, bh->b_size, bh_offset(bh));
>> if (len == bh->b_size) {
>> + lock_buffer(bh);
>> + map_bh(bh, segbuf->sb_super, wi->blocknr + wi->end);
>> + unlock_buffer(bh);
>> wi->end++;
>> return 0;
>> }
>> diff --git a/fs/nilfs2/segbuf.h b/fs/nilfs2/segbuf.h
>> index b04f08c..482bbad 100644
>> --- a/fs/nilfs2/segbuf.h
>> +++ b/fs/nilfs2/segbuf.h
>> @@ -83,6 +83,8 @@ struct nilfs_segment_buffer {
>> sector_t sb_fseg_start, sb_fseg_end;
>> sector_t sb_pseg_start;
>> unsigned sb_rest_blocks;
>> + __u32 sb_su_blocks_cancel;
>> + __s64 sb_su_blocks;
>>
>> /* Buffers */
>> struct list_head sb_segsum_buffers;
>> @@ -122,6 +124,8 @@ void nilfs_segbuf_map(struct nilfs_segment_buffer *, __u64, unsigned long,
>> struct the_nilfs *);
>> void nilfs_segbuf_map_cont(struct nilfs_segment_buffer *segbuf,
>> struct nilfs_segment_buffer *prev);
>> +int nilfs_segbuf_set_sui(struct nilfs_segment_buffer *segbuf,
>> + struct the_nilfs *nilfs);
>> void nilfs_segbuf_set_next_segnum(struct nilfs_segment_buffer *, __u64,
>> struct the_nilfs *);
>> int nilfs_segbuf_reset(struct nilfs_segment_buffer *, unsigned, time_t, __u64);
>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
>> index a1a1916..5d98a1c 100644
>> --- a/fs/nilfs2/segment.c
>> +++ b/fs/nilfs2/segment.c
>> @@ -1257,6 +1257,10 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
>> }
>> nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
>>
>> + err = nilfs_segbuf_set_sui(segbuf, nilfs);
>> + if (err)
>> + goto failed;
>> +
>> BUG_ON(!list_empty(&sci->sc_segbufs));
>> list_add_tail(&segbuf->sb_list, &sci->sc_segbufs);
>> sci->sc_segbuf_nblocks = segbuf->sb_rest_blocks;
>> @@ -1306,6 +1310,10 @@ static int nilfs_segctor_extend_segments(struct nilfs_sc_info *sci,
>> segbuf->sb_sum.seg_seq = prev->sb_sum.seg_seq + 1;
>> nilfs_segbuf_set_next_segnum(segbuf, nextnextnum, nilfs);
>>
>> + err = nilfs_segbuf_set_sui(segbuf, nilfs);
>> + if (err)
>> + goto failed;
>> +
>> list_add_tail(&segbuf->sb_list, &list);
>> prev = segbuf;
>> }
>> @@ -1368,8 +1376,7 @@ static void nilfs_segctor_update_segusage(struct nilfs_sc_info *sci,
>> int ret;
>>
>> list_for_each_entry(segbuf, &sci->sc_segbufs, sb_list) {
>> - live_blocks = segbuf->sb_sum.nblocks +
>> - (segbuf->sb_pseg_start - segbuf->sb_fseg_start);
>> + live_blocks = segbuf->sb_sum.nfileblk + segbuf->sb_su_blocks;
>> ret = nilfs_sufile_set_segment_usage(sufile, segbuf->sb_segnum,
>> live_blocks,
>> sci->sc_seg_ctime);
>> @@ -1383,9 +1390,9 @@ static void nilfs_cancel_segusage(struct list_head *logs, struct inode *sufile)
>> int ret;
>>
>> segbuf = NILFS_FIRST_SEGBUF(logs);
>> +
>> ret = nilfs_sufile_set_segment_usage(sufile, segbuf->sb_segnum,
>> - segbuf->sb_pseg_start -
>> - segbuf->sb_fseg_start, 0);
>> + segbuf->sb_su_blocks_cancel, 0);
>> WARN_ON(ret); /* always succeed because the segusage is dirty */
>>
>> list_for_each_entry_continue(segbuf, logs, sb_list) {
>> @@ -1477,7 +1484,9 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
>> struct nilfs_segment_buffer *segbuf,
>> int mode)
>> {
>> + struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
>> struct inode *inode = NULL;
>> + struct nilfs_inode_info *ii;
>> sector_t blocknr;
>> unsigned long nfinfo = segbuf->sb_sum.nfinfo;
>> unsigned long nblocks = 0, ndatablk = 0;
>> @@ -1487,7 +1496,9 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
>> union nilfs_binfo binfo;
>> struct buffer_head *bh, *bh_org;
>> ino_t ino = 0;
>> - int err = 0;
>> + int gc_inode = 0, err = 0;
>> + __u64 segnum, prev_segnum = 0, dectime = 0, maxdectime = 0;
>> + __u32 blkcount = 0;
>>
>> if (!nfinfo)
>> goto out;
>> @@ -1508,6 +1519,17 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
>>
>> inode = bh->b_page->mapping->host;
>>
>> + ii = NILFS_I(inode);
>> + gc_inode = test_bit(NILFS_I_GCINODE, &ii->i_state);
>> + dectime = sci->sc_seg_ctime;
>
> The dectime sounds not very good for me.
I decrement the block counter in the SUFILE here. And this is the time
it got decremented. But I guess I could think of something better...
>> + /* no update of lastdec necessary */
>> + if (ino == NILFS_DAT_INO || ino == NILFS_SUFILE_INO ||
>> + ino == NILFS_CPFILE_INO)
>> + dectime = 0;
>
> What about such?
>
> if (ino == NILFS_DAT_INO ||
> ino == NILFS_SUFILE_INO ||
> ino == NILFS_CPFILE_INO)
> dectime = 0;
>
> But really I prefer to see small check function (is_metadata_file(), for
> example).
Ok.
>> +
>> + if (dectime > maxdectime)
>> + maxdectime = dectime;
>> +
>> if (mode == SC_LSEG_DSYNC)
>> sc_op = &nilfs_sc_dsync_ops;
>> else if (ino == NILFS_DAT_INO)
>> @@ -1515,6 +1537,39 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
>> else /* file blocks */
>> sc_op = &nilfs_sc_file_ops;
>> }
>> +
>> + segnum = nilfs_get_segnum_of_block(nilfs, bh->b_blocknr);
>> + if (!gc_inode && bh->b_blocknr > 0 &&
>> + (ino == NILFS_DAT_INO || !buffer_nilfs_node(bh)) &&
>> + segnum < nilfs->ns_nsegments) {
>> +
>> + if (segnum != prev_segnum) {
>> + if (blkcount) {
>> + nilfs_sufile_add_segment_usage(
>> + nilfs->ns_sufile,
>> + prev_segnum,
>> + -((__s64)blkcount),
>> + maxdectime);
>
> It is really bad code style. Usually, it means necessity to refactor
> function's code. Otherwise, it is really hard to understand the code.
>
>> + }
>> + prev_segnum = segnum;
>> + blkcount = 0;
>> + maxdectime = dectime;
>> + }
>> +
>> +
>> + if (segnum == segbuf->sb_segnum)
>> + segbuf->sb_su_blocks--;
>> + else
>> + ++blkcount;
>> + } else if (gc_inode && bh->b_blocknr > 0) {
>> + /* check again if gc blocks are alive */
>> + if (!buffer_nilfs_snapshot(bh) &&
>> + (buffer_nilfs_protection_period(bh) ||
>> + !nilfs_dat_is_live(nilfs->ns_dat,
>> + bh->b_blocknr)))
>> + segbuf->sb_su_blocks--;
>
> Ahhhhh. Again and again. :) Bad code style. You need to improve your
> taste. :)
Ok admittedly that part could use a little bit of refactoring. It grew
more and more complex during development. At the beginning it was just a
simple if-statement and a function call.
>> + }
>> +
>> bh_org = bh;
>> get_bh(bh_org);
>> err = nilfs_bmap_assign(NILFS_I(inode)->i_bmap, &bh, blocknr,
>> @@ -1538,6 +1593,10 @@ nilfs_segctor_update_payload_blocknr(struct nilfs_sc_info *sci,
>> } else if (ndatablk > 0)
>> ndatablk--;
>> }
>> +
>> + if (blkcount)
>> + nilfs_sufile_add_segment_usage(nilfs->ns_sufile, prev_segnum,
>> + -((__s64)blkcount), maxdectime);
>
> Such way -((__s64)blkcount) looks not very good. Very complex and
> confusing construction at whole, from my viewpoint.
Hmm yes I could make blkcount a __s64 from the start and decrement it
instead of incrementing it.
Best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2014-03-18 14:02 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-16 10:47 [PATCH 0/6] nilfs2: implement tracking of live blocks Andreas Rohner
[not found] ` <cover.1394966728.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:47 ` [PATCH 1/6] nilfs2: add helper function to go through all entries of meta data file Andreas Rohner
[not found] ` <2adbf1034ab4b129223553746577f6ec0e699869.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-17 6:51 ` Vyacheslav Dubeyko
2014-03-17 9:24 ` Andreas Rohner
2014-03-16 10:47 ` [PATCH 2/6] nilfs2: add new timestamp to seg usage and function to change su_nblocks Andreas Rohner
[not found] ` <12561ce5e2cf8ae07fdda05e16c357f37d17c62f.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 13:00 ` Vyacheslav Dubeyko
[not found] ` <2FD47FE0-3468-4EF4-AAAE-4A636C640C44-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-03-16 12:24 ` Andreas Rohner
[not found] ` <53259801.5080409-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 13:34 ` Vyacheslav Dubeyko
[not found] ` <0ED0D5DA-9AE9-44B8-8936-1680DE2B64C5-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-03-16 16:02 ` Andreas Rohner
2014-03-16 14:06 ` Vyacheslav Dubeyko
[not found] ` <ED41900C-6380-44C1-AC7E-EB8DF74EBFBD-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-03-16 13:31 ` Ryusuke Konishi
[not found] ` <20140316.223111.52181167.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-03-16 16:19 ` Andreas Rohner
2014-03-16 10:47 ` [PATCH 3/6] nilfs2: scan dat entries at snapshot creation/deletion time Andreas Rohner
[not found] ` <29dee92595249b713fff1e4903d5d76556926eec.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-17 7:04 ` Vyacheslav Dubeyko
2014-03-17 9:35 ` Andreas Rohner
[not found] ` <5326C1E5.10108-hi6Y0CQ0nG0@public.gmane.org>
2014-03-17 9:54 ` Vyacheslav Dubeyko
2014-03-16 10:47 ` [PATCH 4/6] nilfs2: add ioctl() to clean snapshot flags from dat entries Andreas Rohner
[not found] ` <be7d3bd13015117222aac43194c0fdb9c5d0046f.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-17 13:19 ` Vyacheslav Dubeyko
2014-03-17 13:49 ` Andreas Rohner
[not found] ` <5326FD51.7000209-hi6Y0CQ0nG0@public.gmane.org>
2014-03-18 7:10 ` Vyacheslav Dubeyko
2014-03-18 8:38 ` Andreas Rohner
2014-03-16 10:47 ` [PATCH 5/6] nilfs2: add counting of live blocks for blocks that are overwritten Andreas Rohner
[not found] ` <25dd8a8bb6943ffa3e0663848363135585a48109.1394966729.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-18 11:50 ` Vyacheslav Dubeyko
2014-03-18 14:02 ` Andreas Rohner
2014-03-16 10:47 ` [PATCH 6/6] nilfs2: add counting of live blocks for deleted files Andreas Rohner
2014-03-16 10:49 ` [PATCH 1/4] nilfs-utils: remove reliance on sui_nblocks to read segment Andreas Rohner
[not found] ` <36b7f57861b69c7fdb9d9e54a21df6f5c7f21061.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 10:49 ` [PATCH 2/4] nilfs-utils: add cost-benefit and greedy policies Andreas Rohner
[not found] ` <cc43be2e6bba5367fd2982dc0df5255b884bdace.1394966935.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 12:55 ` Ryusuke Konishi
[not found] ` <20140316.215545.291456562.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-03-16 15:50 ` Andreas Rohner
2014-03-16 10:49 ` [PATCH 3/4] nilfs-utils: add support for nilfs_clean_snapshot_flags() Andreas Rohner
2014-03-16 10:49 ` [PATCH 4/4] nilfs-utils: add extra flags to nilfs_vdesc and update sui_nblocks Andreas Rohner
2014-03-16 11:01 ` [PATCH 0/6] nilfs2: implement tracking of live blocks Andreas Rohner
[not found] ` <532584A2.8000004-hi6Y0CQ0nG0@public.gmane.org>
2014-03-16 12:34 ` Vyacheslav Dubeyko
[not found] ` <3EC9549C-84A7-49B5-9BE1-34A7337BFFDC-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-03-16 11:36 ` Andreas Rohner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).