linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB
@ 2013-04-25 10:28 Vyacheslav Dubeyko
  2013-04-25 16:37 ` Ryusuke Konishi
  0 siblings, 1 reply; 4+ messages in thread
From: Vyacheslav Dubeyko @ 2013-04-25 10:28 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: linux-nilfs, linux-fsdevel, Andrew Morton, Anthony Doggett

Hi Ryusuke,

I think that this patch really needs in your review.

Thanks,
Vyacheslav Dubeyko.
--
From: Vyacheslav Dubeyko <slava@dubeyko.com>
Subject: [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB

DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size lesser than 4 KB) can be remounted in RO mode because of encountering of "broken bmap" issue.

The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>: "The machine I've been trialling nilfs on is running Debian Testing, Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've also reproduced it (identically) with Debian Unstable amd64 and Debian Experimental (using the 3.8-trunk kernel).  The problematic partitions were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

SYMPTOMS:
(1) System log contains error messages likewise:

    [63102.496756] nilfs_direct_assign: invalid pointer: 0
    [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
    [63102.496798]
    [63102.524403] Remounting filesystem read-only

(2) The NILFS2 file system is remounted in RO mode.

REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

----------------[BEGIN SCRIPT]--------------------
#!/bin/bash

VG=unencrypted
#apt-get install nilfs-tools darcs
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

REPRODUCIBILITY: 100%

INVESTIGATION:
As it was discovered, the issue takes place during segment construction after executing such sequence of user-space operations:

open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)

Factually, the error message "NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)" takes place because of trying to get block number for third block of the file with logical offset #3072 bytes. As it is possible to see from above output, the file has 60 bytes of the whole size. So, it is enough one block (1 KB in size) allocation for the whole file. Trying to operate with several blocks instead of one takes place because of discovering several dirty buffers for this file in nilfs_segctor_scan_file() method. The create_empty_buffers() method creates buffers for the whole page size. So, for the case of 1 KB block this method creates 4 empty buffers. As a result, in the case of block size lesser than 4 KB the NILFS2 code tries to operate with all buffers for the page 
 without taking into account really allocated buffers count.

FIX:
The BH_NILFS_Allocated flag is used as a basis for the issue fix. Namely, this flag is a info that this buffer is a really allocated buffer. So, every buffer is marked as BH_NILFS_Allocated in the case of mapping on real block. And only buffers with flag BH_NILFS_Allocated are getting into operations with its.

Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
CC: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/btnode.c   |    2 ++
 fs/nilfs2/btree.c    |    2 ++
 fs/nilfs2/gcinode.c  |    1 +
 fs/nilfs2/inode.c    |    2 ++
 fs/nilfs2/mdt.c      |    2 ++
 fs/nilfs2/page.c     |    5 +++++
 fs/nilfs2/page.h     |    1 +
 fs/nilfs2/recovery.c |   12 +++++++++++-
 fs/nilfs2/segbuf.c   |    2 ++
 fs/nilfs2/segment.c  |    5 +++--
 fs/nilfs2/super.c    |    2 ++
 11 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index a35ae35..cbc0057 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -60,6 +60,7 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
 	bh->b_blocknr = blocknr;
 	set_buffer_mapped(bh);
 	set_buffer_uptodate(bh);
+	set_buffer_nilfs_allocated(bh);
 
 	unlock_page(bh->b_page);
 	page_cache_release(bh->b_page);
@@ -124,6 +125,7 @@ int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
 	*submit_ptr = pblocknr;
 	err = 0;
 found:
+	set_buffer_nilfs_allocated(bh);
 	*pbh = bh;
 
 out_locked:
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index b2e3ff3..bbdaa1f 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -2068,6 +2068,8 @@ static void nilfs_btree_lookup_dirty_buffers(struct nilfs_bmap *btree,
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			bh = head = page_buffers(pvec.pages[i]);
 			do {
+				if (!buffer_nilfs_allocated(bh))
+					continue;
 				if (buffer_dirty(bh))
 					nilfs_btree_add_dirty_buffer(btree,
 								     lists, bh);
diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index 57ceaf3..f7d714b 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -111,6 +111,7 @@ int nilfs_gccache_submit_read_data(struct inode *inode, sector_t blkoff,
 		bh->b_blocknr = vbn;
  out:
 	err = 0;
+	set_buffer_nilfs_allocated(bh);
 	*out_bh = bh;
 
  failed:
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index ba7a1da..8992291 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -93,6 +93,7 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
 		map_bh(bh_result, inode->i_sb, blknum);
 		if (ret > 0)
 			bh_result->b_size = (ret << inode->i_blkbits);
+		set_buffer_nilfs_allocated(bh_result);
 		goto out;
 	}
 	/* data block was not found */
@@ -132,6 +133,7 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
 		set_buffer_delay(bh_result);
 		map_bh(bh_result, inode->i_sb, 0); /* dbn must be changed
 						      to proper value */
+		set_buffer_nilfs_allocated(bh_result);
 	} else if (ret == -ENOENT) {
 		/* not found is not error (e.g. hole); must return without
 		   the mapped state flag. */
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index c4dcd1d..9156635 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -57,6 +57,7 @@ nilfs_mdt_insert_new_block(struct inode *inode, unsigned long block,
 		return ret;
 
 	set_buffer_mapped(bh);
+	set_buffer_nilfs_allocated(bh);
 
 	kaddr = kmap_atomic(bh->b_page);
 	memset(kaddr + bh_offset(bh), 0, 1 << inode->i_blkbits);
@@ -160,6 +161,7 @@ nilfs_mdt_submit_block(struct inode *inode, unsigned long blkoff,
 	ret = 0;
  out:
 	get_bh(bh);
+	set_buffer_nilfs_allocated(bh);
 	*out_bh = bh;
 
  failed_bh:
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 4f07e0e..b95540e 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -101,6 +101,7 @@ void nilfs_forget_buffer(struct buffer_head *bh)
 	clear_buffer_uptodate(bh);
 	clear_buffer_mapped(bh);
 	bh->b_blocknr = -1;
+	clear_buffer_nilfs_allocated(bh);
 	ClearPageUptodate(page);
 	ClearPageMappedToDisk(page);
 	unlock_buffer(bh);
@@ -159,8 +160,11 @@ int nilfs_page_buffers_clean(struct page *page)
 
 	bh = head = page_buffers(page);
 	do {
+		if (!buffer_nilfs_allocated(bh))
+			goto next_buffer;
 		if (buffer_dirty(bh))
 			return 0;
+next_buffer:
 		bh = bh->b_this_page;
 	} while (bh != head);
 	return 1;
@@ -435,6 +439,7 @@ void nilfs_clear_dirty_page(struct page *page, bool silent)
 			clear_buffer_nilfs_checked(bh);
 			clear_buffer_nilfs_redirected(bh);
 			clear_buffer_uptodate(bh);
+			clear_buffer_nilfs_allocated(bh);
 			clear_buffer_mapped(bh);
 			unlock_buffer(bh);
 		} while (bh = bh->b_this_page, bh != head);
diff --git a/fs/nilfs2/page.h b/fs/nilfs2/page.h
index ef30c5c..008e234 100644
--- a/fs/nilfs2/page.h
+++ b/fs/nilfs2/page.h
@@ -38,6 +38,7 @@ enum {
 	BH_NILFS_Redirected,
 };
 
+BUFFER_FNS(NILFS_Allocated, nilfs_allocated)
 BUFFER_FNS(NILFS_Node, nilfs_node)		/* nilfs node buffers */
 BUFFER_FNS(NILFS_Volatile, nilfs_volatile)
 BUFFER_FNS(NILFS_Checked, nilfs_checked)	/* buffer is verified */
diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index ff00a0b..bac3575 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -122,6 +122,7 @@ static int nilfs_compute_checksum(struct the_nilfs *nilfs,
 			bh = __bread(nilfs->ns_bdev, ++start, blocksize);
 			if (!bh)
 				return -EIO;
+			set_buffer_nilfs_allocated(bh);
 			check_bytes -= size;
 			size = min_t(u64, check_bytes, blocksize);
 			crc = crc32_le(crc, bh->b_data, size);
@@ -153,6 +154,7 @@ int nilfs_read_super_root_block(struct the_nilfs *nilfs, sector_t sr_block,
 		ret = NILFS_SEG_FAIL_IO;
 		goto failed;
 	}
+	set_buffer_nilfs_allocated(bh_sr);
 
 	sr = (struct nilfs_super_root *)bh_sr->b_data;
 	if (check) {
@@ -196,8 +198,10 @@ nilfs_read_log_header(struct the_nilfs *nilfs, sector_t start_blocknr,
 	struct buffer_head *bh_sum;
 
 	bh_sum = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
-	if (bh_sum)
+	if (bh_sum) {
+		set_buffer_nilfs_allocated(bh_sum);
 		*sum = (struct nilfs_segment_summary *)bh_sum->b_data;
+	}
 	return bh_sum;
 }
 
@@ -266,6 +270,7 @@ static void *nilfs_read_summary_info(struct the_nilfs *nilfs,
 			       nilfs->ns_blocksize);
 		if (unlikely(!*pbh))
 			return NULL;
+		set_buffer_nilfs_allocated(*pbh);
 		*offset = 0;
 	}
 	ptr = (*pbh)->b_data + *offset;
@@ -303,6 +308,8 @@ static void nilfs_skip_summary_info(struct the_nilfs *nilfs,
 		brelse(*pbh);
 		*pbh = __bread(nilfs->ns_bdev, blocknr + bcnt,
 			       nilfs->ns_blocksize);
+		if (*pbh)
+			set_buffer_nilfs_allocated(*pbh);
 	}
 }
 
@@ -333,6 +340,7 @@ static int nilfs_scan_dsync_log(struct the_nilfs *nilfs, sector_t start_blocknr,
 	bh = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
 	if (unlikely(!bh))
 		goto out;
+	set_buffer_nilfs_allocated(bh);
 
 	offset = le16_to_cpu(sum->ss_bytes);
 	for (;;) {
@@ -492,6 +500,7 @@ static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
 	bh_org = __bread(nilfs->ns_bdev, rb->blocknr, nilfs->ns_blocksize);
 	if (unlikely(!bh_org))
 		return -EIO;
+	set_buffer_nilfs_allocated(bh_org);
 
 	kaddr = kmap_atomic(page);
 	memcpy(kaddr + bh_offset(bh_org), bh_org->b_data, bh_org->b_size);
@@ -712,6 +721,7 @@ static void nilfs_finish_roll_forward(struct the_nilfs *nilfs,
 
 	bh = __getblk(nilfs->ns_bdev, ri->ri_lsegs_start, nilfs->ns_blocksize);
 	BUG_ON(!bh);
+	set_buffer_nilfs_allocated(bh);
 	memset(bh->b_data, 0, bh->b_size);
 	set_buffer_dirty(bh);
 	err = sync_dirty_buffer(bh);
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index dc9a913..c69a2b3 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -113,6 +113,7 @@ int nilfs_segbuf_extend_segsum(struct nilfs_segment_buffer *segbuf)
 		       segbuf->sb_pseg_start + segbuf->sb_sum.nsumblk);
 	if (unlikely(!bh))
 		return -ENOMEM;
+	set_buffer_nilfs_allocated(bh);
 
 	nilfs_segbuf_add_segsum_buffer(segbuf, bh);
 	return 0;
@@ -127,6 +128,7 @@ int nilfs_segbuf_extend_payload(struct nilfs_segment_buffer *segbuf,
 		       segbuf->sb_pseg_start + segbuf->sb_sum.nblocks);
 	if (unlikely(!bh))
 		return -ENOMEM;
+	set_buffer_nilfs_allocated(bh);
 
 	nilfs_segbuf_add_payload_buffer(segbuf, bh);
 	*bhp = bh;
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a5752a58..3d4cdae 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -665,7 +665,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode,
 
 		bh = head = page_buffers(page);
 		do {
-			if (!buffer_dirty(bh))
+			if (!buffer_dirty(bh) || !buffer_nilfs_allocated(bh))
 				continue;
 			get_bh(bh);
 			list_add_tail(&bh->b_assoc_buffers, listp);
@@ -699,7 +699,8 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode,
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			bh = head = page_buffers(pvec.pages[i]);
 			do {
-				if (buffer_dirty(bh)) {
+				if (buffer_dirty(bh) &&
+						buffer_nilfs_allocated(bh)) {
 					get_bh(bh);
 					list_add_tail(&bh->b_assoc_buffers,
 						      listp);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index c7d1f9f..d190494 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -384,6 +384,7 @@ static int nilfs_move_2nd_super(struct super_block *sb, loff_t sb2off)
 		ret = -EIO;
 		goto out;
 	}
+	set_buffer_nilfs_allocated(nsbh);
 	nsbp = (void *)nsbh->b_data + offset;
 	memset(nsbp, 0, nilfs->ns_blocksize);
 
@@ -836,6 +837,7 @@ struct nilfs_super_block *nilfs_read_super_block(struct super_block *sb,
 	*pbh = sb_bread(sb, sb_index);
 	if (!*pbh)
 		return NULL;
+	set_buffer_nilfs_allocated(*pbh);
 	return (struct nilfs_super_block *)((char *)(*pbh)->b_data + offset);
 }
 
-- 
1.7.9.5




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB
  2013-04-25 10:28 [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB Vyacheslav Dubeyko
@ 2013-04-25 16:37 ` Ryusuke Konishi
       [not found]   ` <20130426.013748.27792413.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Ryusuke Konishi @ 2013-04-25 16:37 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Anthony Doggett

Hi Vyacheslav,
On Thu, 25 Apr 2013 14:28:37 +0400, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> I think that this patch really needs in your review.
> 
> Thanks,
> Vyacheslav Dubeyko.
> --
> From: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
> Subject: [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB
> 
> DESCRIPTION:
> There are use-cases when NILFS2 file system (formatted with block size lesser than 4 KB) can be remounted in RO mode because of encountering of "broken bmap" issue.
> 
> The issue was reported by Anthony Doggett <Anthony2486-fDpYTK8McCxDP812hmKXO1pr/1R2p/CL@public.gmane.org>: "The machine I've been trialling nilfs on is running Debian Testing, Linux version 3.2.0-4-686-pae (debian-kernel-0aAXYlwwYIJuHlm7Suoebg@public.gmane.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've also reproduced it (identically) with Debian Unstable amd64 and Debian Experimental (using the 3.8-trunk kernel).  The problematic partitions were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
> 
> SYMPTOMS:
> (1) System log contains error messages likewise:
> 
>     [63102.496756] nilfs_direct_assign: invalid pointer: 0
>     [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
>     [63102.496798]
>     [63102.524403] Remounting filesystem read-only
> 
> (2) The NILFS2 file system is remounted in RO mode.
> 
> REPRODUSING PATH:
> (1) Create volume group with name "unencrypted" by means of vgcreate utility.
> (2) Run script (prepared by Anthony Doggett <Anthony2486-fDpYTK8McCxDP812hmKXO1pr/1R2p/CL@public.gmane.org>):
> 
> ----------------[BEGIN SCRIPT]--------------------
> #!/bin/bash
> 
> VG=unencrypted
> #apt-get install nilfs-tools darcs
> lvcreate --size 2G --name ntest $VG
> mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
> mkdir /var/tmp/n
> mkdir /var/tmp/n/ntest
> mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
> mkdir /var/tmp/n/ntest/thedir
> cd /var/tmp/n/ntest/thedir
> sleep 2
> date
> darcs init
> sleep 2
> dmesg|tail -n 5
> date
> darcs whatsnew || true
> date
> sleep 2
> dmesg|tail -n 5
> ----------------[END SCRIPT]--------------------
> 
> REPRODUCIBILITY: 100%
> 
> INVESTIGATION:
> As it was discovered, the issue takes place during segment construction after executing such sequence of user-space operations:
> 
> open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
> fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> ftruncate(7, 60)
> 
> Factually, the error message "NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)" takes place because of trying to get block number for third block of the file with logical offset #3072 bytes. As it is possible to see from above output, the file has 60 bytes of the whole size. So, it is enough one block (1 KB in size) allocation for the whole file. Trying to operate with several blocks instead of one takes place because of discovering several dirty buffers for this file in nilfs_segctor_scan_file() method. The create_empty_buffers() method creates buffers for the whole page size. So, for the case of 1 KB block this method creates 4 empty buffers. As a result, in the case of block size lesser than 4 KB the NILFS2 code tries to operate with all buffers for the pag
 e without taking into account really allocated buffers count.
> 
> FIX:
> The BH_NILFS_Allocated flag is used as a basis for the issue fix. Namely, this flag is a info that this buffer is a really allocated buffer. So, every buffer is marked as BH_NILFS_Allocated in the case of mapping on real block. And only buffers with flag BH_NILFS_Allocated are getting into operations with its.
> 
> Reported-by: Anthony Doggett <Anthony2486-fDpYTK8McCxDP812hmKXO1pr/1R2p/CL@public.gmane.org>
> Signed-off-by: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
> CC: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>

Adding BH_NILFS_Allocated flag looks overkill for this issue.

Why not modify btree or direct mapping code so that the buffer dirty
flag is propery set or cleared also for 1 KB block?  Basically, the
block mapping code of NILFS is designed to be able to handle 1 KB
block.

If this issue occurs after file size extension with ftruncate(), I
guess handling of hole blocks created by the extension seems to have
problem.

I will look into the issue too.


Regards,
Ryusuke Konishi


> ---
>  fs/nilfs2/btnode.c   |    2 ++
>  fs/nilfs2/btree.c    |    2 ++
>  fs/nilfs2/gcinode.c  |    1 +
>  fs/nilfs2/inode.c    |    2 ++
>  fs/nilfs2/mdt.c      |    2 ++
>  fs/nilfs2/page.c     |    5 +++++
>  fs/nilfs2/page.h     |    1 +
>  fs/nilfs2/recovery.c |   12 +++++++++++-
>  fs/nilfs2/segbuf.c   |    2 ++
>  fs/nilfs2/segment.c  |    5 +++--
>  fs/nilfs2/super.c    |    2 ++
>  11 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
> index a35ae35..cbc0057 100644
> --- a/fs/nilfs2/btnode.c
> +++ b/fs/nilfs2/btnode.c
> @@ -60,6 +60,7 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
>  	bh->b_blocknr = blocknr;
>  	set_buffer_mapped(bh);
>  	set_buffer_uptodate(bh);
> +	set_buffer_nilfs_allocated(bh);
>  
>  	unlock_page(bh->b_page);
>  	page_cache_release(bh->b_page);
> @@ -124,6 +125,7 @@ int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
>  	*submit_ptr = pblocknr;
>  	err = 0;
>  found:
> +	set_buffer_nilfs_allocated(bh);
>  	*pbh = bh;
>  
>  out_locked:
> diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
> index b2e3ff3..bbdaa1f 100644
> --- a/fs/nilfs2/btree.c
> +++ b/fs/nilfs2/btree.c
> @@ -2068,6 +2068,8 @@ static void nilfs_btree_lookup_dirty_buffers(struct nilfs_bmap *btree,
>  		for (i = 0; i < pagevec_count(&pvec); i++) {
>  			bh = head = page_buffers(pvec.pages[i]);
>  			do {
> +				if (!buffer_nilfs_allocated(bh))
> +					continue;
>  				if (buffer_dirty(bh))
>  					nilfs_btree_add_dirty_buffer(btree,
>  								     lists, bh);
> diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
> index 57ceaf3..f7d714b 100644
> --- a/fs/nilfs2/gcinode.c
> +++ b/fs/nilfs2/gcinode.c
> @@ -111,6 +111,7 @@ int nilfs_gccache_submit_read_data(struct inode *inode, sector_t blkoff,
>  		bh->b_blocknr = vbn;
>   out:
>  	err = 0;
> +	set_buffer_nilfs_allocated(bh);
>  	*out_bh = bh;
>  
>   failed:
> diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
> index ba7a1da..8992291 100644
> --- a/fs/nilfs2/inode.c
> +++ b/fs/nilfs2/inode.c
> @@ -93,6 +93,7 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
>  		map_bh(bh_result, inode->i_sb, blknum);
>  		if (ret > 0)
>  			bh_result->b_size = (ret << inode->i_blkbits);
> +		set_buffer_nilfs_allocated(bh_result);
>  		goto out;
>  	}
>  	/* data block was not found */
> @@ -132,6 +133,7 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
>  		set_buffer_delay(bh_result);
>  		map_bh(bh_result, inode->i_sb, 0); /* dbn must be changed
>  						      to proper value */
> +		set_buffer_nilfs_allocated(bh_result);
>  	} else if (ret == -ENOENT) {
>  		/* not found is not error (e.g. hole); must return without
>  		   the mapped state flag. */
> diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
> index c4dcd1d..9156635 100644
> --- a/fs/nilfs2/mdt.c
> +++ b/fs/nilfs2/mdt.c
> @@ -57,6 +57,7 @@ nilfs_mdt_insert_new_block(struct inode *inode, unsigned long block,
>  		return ret;
>  
>  	set_buffer_mapped(bh);
> +	set_buffer_nilfs_allocated(bh);
>  
>  	kaddr = kmap_atomic(bh->b_page);
>  	memset(kaddr + bh_offset(bh), 0, 1 << inode->i_blkbits);
> @@ -160,6 +161,7 @@ nilfs_mdt_submit_block(struct inode *inode, unsigned long blkoff,
>  	ret = 0;
>   out:
>  	get_bh(bh);
> +	set_buffer_nilfs_allocated(bh);
>  	*out_bh = bh;
>  
>   failed_bh:
> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
> index 4f07e0e..b95540e 100644
> --- a/fs/nilfs2/page.c
> +++ b/fs/nilfs2/page.c
> @@ -101,6 +101,7 @@ void nilfs_forget_buffer(struct buffer_head *bh)
>  	clear_buffer_uptodate(bh);
>  	clear_buffer_mapped(bh);
>  	bh->b_blocknr = -1;
> +	clear_buffer_nilfs_allocated(bh);
>  	ClearPageUptodate(page);
>  	ClearPageMappedToDisk(page);
>  	unlock_buffer(bh);
> @@ -159,8 +160,11 @@ int nilfs_page_buffers_clean(struct page *page)
>  
>  	bh = head = page_buffers(page);
>  	do {
> +		if (!buffer_nilfs_allocated(bh))
> +			goto next_buffer;
>  		if (buffer_dirty(bh))
>  			return 0;
> +next_buffer:
>  		bh = bh->b_this_page;
>  	} while (bh != head);
>  	return 1;
> @@ -435,6 +439,7 @@ void nilfs_clear_dirty_page(struct page *page, bool silent)
>  			clear_buffer_nilfs_checked(bh);
>  			clear_buffer_nilfs_redirected(bh);
>  			clear_buffer_uptodate(bh);
> +			clear_buffer_nilfs_allocated(bh);
>  			clear_buffer_mapped(bh);
>  			unlock_buffer(bh);
>  		} while (bh = bh->b_this_page, bh != head);
> diff --git a/fs/nilfs2/page.h b/fs/nilfs2/page.h
> index ef30c5c..008e234 100644
> --- a/fs/nilfs2/page.h
> +++ b/fs/nilfs2/page.h
> @@ -38,6 +38,7 @@ enum {
>  	BH_NILFS_Redirected,
>  };
>  
> +BUFFER_FNS(NILFS_Allocated, nilfs_allocated)
>  BUFFER_FNS(NILFS_Node, nilfs_node)		/* nilfs node buffers */
>  BUFFER_FNS(NILFS_Volatile, nilfs_volatile)
>  BUFFER_FNS(NILFS_Checked, nilfs_checked)	/* buffer is verified */
> diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
> index ff00a0b..bac3575 100644
> --- a/fs/nilfs2/recovery.c
> +++ b/fs/nilfs2/recovery.c
> @@ -122,6 +122,7 @@ static int nilfs_compute_checksum(struct the_nilfs *nilfs,
>  			bh = __bread(nilfs->ns_bdev, ++start, blocksize);
>  			if (!bh)
>  				return -EIO;
> +			set_buffer_nilfs_allocated(bh);
>  			check_bytes -= size;
>  			size = min_t(u64, check_bytes, blocksize);
>  			crc = crc32_le(crc, bh->b_data, size);
> @@ -153,6 +154,7 @@ int nilfs_read_super_root_block(struct the_nilfs *nilfs, sector_t sr_block,
>  		ret = NILFS_SEG_FAIL_IO;
>  		goto failed;
>  	}
> +	set_buffer_nilfs_allocated(bh_sr);
>  
>  	sr = (struct nilfs_super_root *)bh_sr->b_data;
>  	if (check) {
> @@ -196,8 +198,10 @@ nilfs_read_log_header(struct the_nilfs *nilfs, sector_t start_blocknr,
>  	struct buffer_head *bh_sum;
>  
>  	bh_sum = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
> -	if (bh_sum)
> +	if (bh_sum) {
> +		set_buffer_nilfs_allocated(bh_sum);
>  		*sum = (struct nilfs_segment_summary *)bh_sum->b_data;
> +	}
>  	return bh_sum;
>  }
>  
> @@ -266,6 +270,7 @@ static void *nilfs_read_summary_info(struct the_nilfs *nilfs,
>  			       nilfs->ns_blocksize);
>  		if (unlikely(!*pbh))
>  			return NULL;
> +		set_buffer_nilfs_allocated(*pbh);
>  		*offset = 0;
>  	}
>  	ptr = (*pbh)->b_data + *offset;
> @@ -303,6 +308,8 @@ static void nilfs_skip_summary_info(struct the_nilfs *nilfs,
>  		brelse(*pbh);
>  		*pbh = __bread(nilfs->ns_bdev, blocknr + bcnt,
>  			       nilfs->ns_blocksize);
> +		if (*pbh)
> +			set_buffer_nilfs_allocated(*pbh);
>  	}
>  }
>  
> @@ -333,6 +340,7 @@ static int nilfs_scan_dsync_log(struct the_nilfs *nilfs, sector_t start_blocknr,
>  	bh = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
>  	if (unlikely(!bh))
>  		goto out;
> +	set_buffer_nilfs_allocated(bh);
>  
>  	offset = le16_to_cpu(sum->ss_bytes);
>  	for (;;) {
> @@ -492,6 +500,7 @@ static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
>  	bh_org = __bread(nilfs->ns_bdev, rb->blocknr, nilfs->ns_blocksize);
>  	if (unlikely(!bh_org))
>  		return -EIO;
> +	set_buffer_nilfs_allocated(bh_org);
>  
>  	kaddr = kmap_atomic(page);
>  	memcpy(kaddr + bh_offset(bh_org), bh_org->b_data, bh_org->b_size);
> @@ -712,6 +721,7 @@ static void nilfs_finish_roll_forward(struct the_nilfs *nilfs,
>  
>  	bh = __getblk(nilfs->ns_bdev, ri->ri_lsegs_start, nilfs->ns_blocksize);
>  	BUG_ON(!bh);
> +	set_buffer_nilfs_allocated(bh);
>  	memset(bh->b_data, 0, bh->b_size);
>  	set_buffer_dirty(bh);
>  	err = sync_dirty_buffer(bh);
> diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
> index dc9a913..c69a2b3 100644
> --- a/fs/nilfs2/segbuf.c
> +++ b/fs/nilfs2/segbuf.c
> @@ -113,6 +113,7 @@ int nilfs_segbuf_extend_segsum(struct nilfs_segment_buffer *segbuf)
>  		       segbuf->sb_pseg_start + segbuf->sb_sum.nsumblk);
>  	if (unlikely(!bh))
>  		return -ENOMEM;
> +	set_buffer_nilfs_allocated(bh);
>  
>  	nilfs_segbuf_add_segsum_buffer(segbuf, bh);
>  	return 0;
> @@ -127,6 +128,7 @@ int nilfs_segbuf_extend_payload(struct nilfs_segment_buffer *segbuf,
>  		       segbuf->sb_pseg_start + segbuf->sb_sum.nblocks);
>  	if (unlikely(!bh))
>  		return -ENOMEM;
> +	set_buffer_nilfs_allocated(bh);
>  
>  	nilfs_segbuf_add_payload_buffer(segbuf, bh);
>  	*bhp = bh;
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index a5752a58..3d4cdae 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -665,7 +665,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode,
>  
>  		bh = head = page_buffers(page);
>  		do {
> -			if (!buffer_dirty(bh))
> +			if (!buffer_dirty(bh) || !buffer_nilfs_allocated(bh))
>  				continue;
>  			get_bh(bh);
>  			list_add_tail(&bh->b_assoc_buffers, listp);
> @@ -699,7 +699,8 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode,
>  		for (i = 0; i < pagevec_count(&pvec); i++) {
>  			bh = head = page_buffers(pvec.pages[i]);
>  			do {
> -				if (buffer_dirty(bh)) {
> +				if (buffer_dirty(bh) &&
> +						buffer_nilfs_allocated(bh)) {
>  					get_bh(bh);
>  					list_add_tail(&bh->b_assoc_buffers,
>  						      listp);
> diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
> index c7d1f9f..d190494 100644
> --- a/fs/nilfs2/super.c
> +++ b/fs/nilfs2/super.c
> @@ -384,6 +384,7 @@ static int nilfs_move_2nd_super(struct super_block *sb, loff_t sb2off)
>  		ret = -EIO;
>  		goto out;
>  	}
> +	set_buffer_nilfs_allocated(nsbh);
>  	nsbp = (void *)nsbh->b_data + offset;
>  	memset(nsbp, 0, nilfs->ns_blocksize);
>  
> @@ -836,6 +837,7 @@ struct nilfs_super_block *nilfs_read_super_block(struct super_block *sb,
>  	*pbh = sb_bread(sb, sb_index);
>  	if (!*pbh)
>  		return NULL;
> +	set_buffer_nilfs_allocated(*pbh);
>  	return (struct nilfs_super_block *)((char *)(*pbh)->b_data + offset);
>  }
>  
> -- 
> 1.7.9.5
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB
       [not found]   ` <20130426.013748.27792413.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2013-04-29  9:24     ` Vyacheslav Dubeyko
  2013-04-30 18:11       ` Ryusuke Konishi
  0 siblings, 1 reply; 4+ messages in thread
From: Vyacheslav Dubeyko @ 2013-04-29  9:24 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Anthony Doggett

Hi Ryusuke,

On Fri, 2013-04-26 at 01:37 +0900, Ryusuke Konishi wrote:

[snip]
> 
> Adding BH_NILFS_Allocated flag looks overkill for this issue.
> 
> Why not modify btree or direct mapping code so that the buffer dirty
> flag is propery set or cleared also for 1 KB block?  Basically, the
> block mapping code of NILFS is designed to be able to handle 1 KB
> block.
> 

My idea is simple. As I can see, usual way of processing page's buffers
in NILFS2 driver is likewise:

struct buffer_head *bh, *head;
bh = head = page_buffers(page);
do {

	/* Do something */

} while (bh = bh->b_this_page, bh != head);

And, as I understand, the essence of the issue is in processing of spare
buffers that were allocated during create_empty_buffers() call but these
buffers shouldn't be used yet. So, from my viewpoint, the reasonable way
of the issue fix is to ignore such spare buffers. Because operations
with spare buffers can be resulted in presence of very sophisticated
bugs and performance degradation. As a result, I suggested to use
BH_NILFS_Allocated flag in this patch.

> If this issue occurs after file size extension with ftruncate(), I
> guess handling of hole blocks created by the extension seems to have
> problem.
> 

The nature of this issue is more complex, as I described in patch. The
ftruncate call is simply beginning of the issue. The peculiarity of this
call for this issue consists in growing file from zero bytes to 60
bytes. So, for 1 KB block size we should use only one block (hole) but,
as a result, we have 4 allocated buffers (1 for hole block and 3 spare
buffers).

[  223.543981] nilfs_truncate:715: blkoff 1, inode->i_size 60
[  223.544068] nilfs_get_block:89: inode->i_ino 21, ii->i_bmap->b_last_allocated_ptr 0
[  223.544079] nilfs_truncate:729: inode->i_ino 21
[  223.544086] Pid: 2424, comm: darcs Tainted: G          I  3.8.0-rc1 #147
[  223.544089] Call Trace:
[  223.544098]  [<ffffffff812e0c77>] nilfs_truncate+0x117/0x130
[  223.544105]  [<ffffffff811516de>] ? truncate_pagecache+0x5e/0x70
[  223.544111]  [<ffffffff812e0d11>] nilfs_setattr+0x81/0xb0
[  223.544119]  [<ffffffff817285b1>] ? mutex_lock_nested+0x2a1/0x370
[  223.544126]  [<ffffffff811bf60b>] notify_change+0x1cb/0x390
[  223.544132]  [<ffffffff811a1290>] do_truncate+0x60/0xa0
[  223.544138]  [<ffffffff811a1644>] sys_ftruncate+0x114/0x180
[  223.544146]  [<ffffffff8139fcae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  223.544152]  [<ffffffff81735119>] system_call_fastpath+0x16/0x1b

Then, it is possible to identify trying to write in file mapping:

[  489.785334] nilfs_page_mkwrite:146: inode->i_ino 21
[  489.785336] Pid: 2447, comm: darcs Tainted: G          I  3.8.0-rc1 #149
[  489.785338] Call Trace:
[  489.785342]  [<ffffffff812e1a55>] nilfs_page_mkwrite+0x315/0x390
[  489.785347]  [<ffffffff81169562>] __do_fault+0xe2/0x4c0
[  489.785351]  [<ffffffff8116c260>] handle_pte_fault+0x90/0x8e0
[  489.785355]  [<ffffffff8101dcc9>] ? sched_clock+0x9/0x10
[  489.785359]  [<ffffffff81098985>] ? sched_clock_local+0x25/0x90
[  489.785363]  [<ffffffff81730644>] ? __do_page_fault+0x114/0x5b0
[  489.785367]  [<ffffffff8116dc85>] handle_mm_fault+0x235/0x330
[  489.785371]  [<ffffffff817306cc>] __do_page_fault+0x19c/0x5b0
[  489.785375]  [<ffffffff81098b18>] ? sched_clock_cpu+0xa8/0x110
[  489.785378]  [<ffffffff810bd9fd>] ? trace_hardirqs_off+0xd/0x10
[  489.785382]  [<ffffffff81098bef>] ? local_clock+0x6f/0x80
[  489.785386]  [<ffffffff810be345>] ? lock_release_holdtime.part.25+0x15/0x1a0
[  489.785390]  [<ffffffff811c1183>] ? __close_fd+0x83/0xb0
[  489.785394]  [<ffffffff8139fe9d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  489.785398]  [<ffffffff81730aee>] do_page_fault+0xe/0x10
[  489.785402]  [<ffffffff8172cc48>] page_fault+0x28/0x30

It is possible to see that after segment construction we have such state
of bmap (namely, it was set two block numbers):

[  490.177923] nilfs_direct_set_ptr: inode: 21
[  490.177925] NILFS2: 00 00 00 00 00 00 00 00 01 14 00 00 00 00 00 00  ................
[  490.177927] NILFS2: 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  490.177929] NILFS2: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  490.177931] NILFS2: 00 00 00 00 00 00 00 00

And, finally, we have issue:

[  490.179790] nilfs_direct_assign: inode: 21
[  490.179800] nilfs_direct_assign: ptr: 5121
[  490.179802] nilfs_dat_prepare_entry:57 req->pr_entry_nr 5121
[  490.179805] nilfs_palloc_get_entry_block:294 nr 5121, create 0
[  490.179815] nilfs_direct_assign: inode: 21
[  490.179828] nilfs_direct_assign: ptr: 10
[  490.179830] nilfs_dat_prepare_entry:57 req->pr_entry_nr 10
[  490.179832] nilfs_palloc_get_entry_block:294 nr 10, create 0
[  490.179843] nilfs_direct_assign: inode: 21
[  490.179857] nilfs_direct_assign: ptr: 0
[  490.179859] nilfs_direct_assign: invalid pointer: 0
[  490.179862] NILFS error (device dm-0): nilfs_bmap_assign: broken bmap (inode number=21)

With the best regards,
Vyacheslav Dubeyko.

> I will look into the issue too.
> 
> 
> Regards,
> Ryusuke Konishi
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB
  2013-04-29  9:24     ` Vyacheslav Dubeyko
@ 2013-04-30 18:11       ` Ryusuke Konishi
  0 siblings, 0 replies; 4+ messages in thread
From: Ryusuke Konishi @ 2013-04-30 18:11 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: linux-nilfs, linux-fsdevel, Andrew Morton, Anthony Doggett

Hi Vyacheslav,
On Mon, 29 Apr 2013 13:24:53 +0400, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> On Fri, 2013-04-26 at 01:37 +0900, Ryusuke Konishi wrote:
> 
> [snip]
>> 
>> Adding BH_NILFS_Allocated flag looks overkill for this issue.
>> 
>> Why not modify btree or direct mapping code so that the buffer dirty
>> flag is propery set or cleared also for 1 KB block?  Basically, the
>> block mapping code of NILFS is designed to be able to handle 1 KB
>> block.
>> 
> 
> My idea is simple. As I can see, usual way of processing page's buffers
> in NILFS2 driver is likewise:
> 
> struct buffer_head *bh, *head;
> bh = head = page_buffers(page);
> do {
> 
> 	/* Do something */
> 
> } while (bh = bh->b_this_page, bh != head);
> 
> And, as I understand, the essence of the issue is in processing of spare
> buffers that were allocated during create_empty_buffers() call but these
> buffers shouldn't be used yet. So, from my viewpoint, the reasonable way
> of the issue fix is to ignore such spare buffers. Because operations
> with spare buffers can be resulted in presence of very sophisticated
> bugs and performance degradation. As a result, I suggested to use
> BH_NILFS_Allocated flag in this patch.

That is the point of difference.  I think we can handle this kind of
issue by managing buffer dirty state strictly since the segment
constructor of nilfs only picks up dirty buffers for write through
nilfs_lookup_dirty_data_buffers function.  Spare buffers are simply
ignored if they don't have dirty flag.

I don't think we need the BH_NILFS_Allocated flag.

>> If this issue occurs after file size extension with ftruncate(), I
>> guess handling of hole blocks created by the extension seems to have
>> problem.
>> 
> 
> The nature of this issue is more complex, as I described in patch. The
> ftruncate call is simply beginning of the issue. The peculiarity of this
> call for this issue consists in growing file from zero bytes to 60
> bytes. So, for 1 KB block size we should use only one block (hole) but,
> as a result, we have 4 allocated buffers (1 for hole block and 3 spare
> buffers).
> 
> [  223.543981] nilfs_truncate:715: blkoff 1, inode->i_size 60
> [  223.544068] nilfs_get_block:89: inode->i_ino 21, ii->i_bmap->b_last_allocated_ptr 0
> [  223.544079] nilfs_truncate:729: inode->i_ino 21
> [  223.544086] Pid: 2424, comm: darcs Tainted: G          I  3.8.0-rc1 #147
> [  223.544089] Call Trace:
> [  223.544098]  [<ffffffff812e0c77>] nilfs_truncate+0x117/0x130
> [  223.544105]  [<ffffffff811516de>] ? truncate_pagecache+0x5e/0x70
> [  223.544111]  [<ffffffff812e0d11>] nilfs_setattr+0x81/0xb0
> [  223.544119]  [<ffffffff817285b1>] ? mutex_lock_nested+0x2a1/0x370
> [  223.544126]  [<ffffffff811bf60b>] notify_change+0x1cb/0x390
> [  223.544132]  [<ffffffff811a1290>] do_truncate+0x60/0xa0
> [  223.544138]  [<ffffffff811a1644>] sys_ftruncate+0x114/0x180
> [  223.544146]  [<ffffffff8139fcae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  223.544152]  [<ffffffff81735119>] system_call_fastpath+0x16/0x1b
> 
> Then, it is possible to identify trying to write in file mapping:
> 
> [  489.785334] nilfs_page_mkwrite:146: inode->i_ino 21
> [  489.785336] Pid: 2447, comm: darcs Tainted: G          I  3.8.0-rc1 #149
> [  489.785338] Call Trace:
> [  489.785342]  [<ffffffff812e1a55>] nilfs_page_mkwrite+0x315/0x390
> [  489.785347]  [<ffffffff81169562>] __do_fault+0xe2/0x4c0
> [  489.785351]  [<ffffffff8116c260>] handle_pte_fault+0x90/0x8e0
> [  489.785355]  [<ffffffff8101dcc9>] ? sched_clock+0x9/0x10
> [  489.785359]  [<ffffffff81098985>] ? sched_clock_local+0x25/0x90
> [  489.785363]  [<ffffffff81730644>] ? __do_page_fault+0x114/0x5b0
> [  489.785367]  [<ffffffff8116dc85>] handle_mm_fault+0x235/0x330
> [  489.785371]  [<ffffffff817306cc>] __do_page_fault+0x19c/0x5b0
> [  489.785375]  [<ffffffff81098b18>] ? sched_clock_cpu+0xa8/0x110
> [  489.785378]  [<ffffffff810bd9fd>] ? trace_hardirqs_off+0xd/0x10
> [  489.785382]  [<ffffffff81098bef>] ? local_clock+0x6f/0x80
> [  489.785386]  [<ffffffff810be345>] ? lock_release_holdtime.part.25+0x15/0x1a0
> [  489.785390]  [<ffffffff811c1183>] ? __close_fd+0x83/0xb0
> [  489.785394]  [<ffffffff8139fe9d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  489.785398]  [<ffffffff81730aee>] do_page_fault+0xe/0x10
> [  489.785402]  [<ffffffff8172cc48>] page_fault+0x28/0x30

Thank you for narrowing down the issue.

So, the root cause is in the use of __block_page_mkwrite() in the
nilfs_page_mkwrite function.

Even if __block_page_mkwrite() only fills hole blocks inside EOF, it
calls set_page_dirty(), and this leads to call nilfs_set_page_dirty()
which marks all buffers on the page dirty through
__set_page_dirty_buffers().  Thus, inconsistent dirty state is
created.

There seems to be two problems here.  The first problem is that the
use of __block_page_mkwrite() in nilfs_page_mkwrite() breaks
consistency of the buffer dirty state if buffers are not fullly filled
due to EOF.  The second problem is that the number of newly dirtied
buffers, which will be passed to nilfs_set_file_dirty function, goes
wrong in that case.

I will later post a bug fix for these issues.


With regards,
Ryusuke Konishi


> It is possible to see that after segment construction we have such state
> of bmap (namely, it was set two block numbers):
> 
> [  490.177923] nilfs_direct_set_ptr: inode: 21
> [  490.177925] NILFS2: 00 00 00 00 00 00 00 00 01 14 00 00 00 00 00 00  ................
> [  490.177927] NILFS2: 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [  490.177929] NILFS2: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [  490.177931] NILFS2: 00 00 00 00 00 00 00 00
> 
> And, finally, we have issue:
> 
> [  490.179790] nilfs_direct_assign: inode: 21
> [  490.179800] nilfs_direct_assign: ptr: 5121
> [  490.179802] nilfs_dat_prepare_entry:57 req->pr_entry_nr 5121
> [  490.179805] nilfs_palloc_get_entry_block:294 nr 5121, create 0
> [  490.179815] nilfs_direct_assign: inode: 21
> [  490.179828] nilfs_direct_assign: ptr: 10
> [  490.179830] nilfs_dat_prepare_entry:57 req->pr_entry_nr 10
> [  490.179832] nilfs_palloc_get_entry_block:294 nr 10, create 0
> [  490.179843] nilfs_direct_assign: inode: 21
> [  490.179857] nilfs_direct_assign: ptr: 0
> [  490.179859] nilfs_direct_assign: invalid pointer: 0
> [  490.179862] NILFS error (device dm-0): nilfs_bmap_assign: broken bmap (inode number=21)
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
>> I will look into the issue too.
>> 
>> 
>> Regards,
>> Ryusuke Konishi
>> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-04-30 18:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-25 10:28 [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB Vyacheslav Dubeyko
2013-04-25 16:37 ` Ryusuke Konishi
     [not found]   ` <20130426.013748.27792413.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-04-29  9:24     ` Vyacheslav Dubeyko
2013-04-30 18:11       ` Ryusuke Konishi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).