linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 3/6] ext4: Use ext4_get_block_write() for DAX
       [not found] <1435934443-17090-1-git-send-email-matthew.r.wilcox@intel.com>
@ 2015-07-03 14:40 ` Matthew Wilcox
  2015-07-03 18:30   ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2015-07-03 14:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, Alexander Viro
  Cc: Matthew Wilcox, Theodore Ts'o, Andreas Dilger, linux-ext4

From: Matthew Wilcox <willy@linux.intel.com>

DAX relies on the get_block function either zeroing newly allocated blocks
before they're findable by subsequent calls to get_block, or marking newly
allocated blocks as unwritten.  ext4_get_block() cannot create unwritten
extents, but ext4_get_block_write() can.

Reported-by: Andy Rudoff <andy.rudoff@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/ext4/file.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc313ac..e5bdcb7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -195,7 +195,7 @@ out:
 static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate)
 {
 	struct inode *inode = bh->b_assoc_map->host;
-	/* XXX: breaks on 32-bit > 16GB. Is that even supported? */
+	/* XXX: breaks on 32-bit > 16TB. Is that even supported? */
 	loff_t offset = (loff_t)(uintptr_t)bh->b_private << inode->i_blkbits;
 	int err;
 	if (!uptodate)
@@ -206,13 +206,13 @@ static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate)
 
 static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	return dax_fault(vma, vmf, ext4_get_block, ext4_end_io_unwritten);
-					/* Is this the right get_block? */
+	return dax_fault(vma, vmf, ext4_get_block_write, ext4_end_io_unwritten);
 }
 
 static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	return dax_mkwrite(vma, vmf, ext4_get_block, ext4_end_io_unwritten);
+	return dax_mkwrite(vma, vmf, ext4_get_block_write,
+				ext4_end_io_unwritten);
 }
 
 static const struct vm_operations_struct ext4_dax_vm_ops = {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/6] ext4: Use ext4_get_block_write() for DAX
  2015-07-03 14:40 ` [PATCH v2 3/6] ext4: Use ext4_get_block_write() for DAX Matthew Wilcox
@ 2015-07-03 18:30   ` Theodore Ts'o
  2015-07-03 18:48     ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2015-07-03 18:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-fsdevel, linux-kernel, Alexander Viro, Matthew Wilcox,
	Andreas Dilger, linux-ext4

On Fri, Jul 03, 2015 at 10:40:40AM -0400, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> DAX relies on the get_block function either zeroing newly allocated blocks
> before they're findable by subsequent calls to get_block, or marking newly
> allocated blocks as unwritten.  ext4_get_block() cannot create unwritten
> extents, but ext4_get_block_write() can.

To be clear, this patch has no prerequistes or dependencies, right?
That is, it would be fine if I take this through the ext4 git tree?
Or is there a reason or a preference for carrying this patch
somewhere else?

Also, is there a way I can test the DAX functionality in ext4 using
KVM?  If so, can you give me a cheat sheet about how I can do that?

Thanks,

							- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/6] ext4: Use ext4_get_block_write() for DAX
  2015-07-03 18:30   ` Theodore Ts'o
@ 2015-07-03 18:48     ` Matthew Wilcox
  2015-07-03 19:07       ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2015-07-03 18:48 UTC (permalink / raw)
  To: Theodore Ts'o, Matthew Wilcox, linux-fsdevel, linux-kernel,
	Alexander Viro, Andreas Dilger, linux-ext4

On Fri, Jul 03, 2015 at 02:30:27PM -0400, Theodore Ts'o wrote:
> On Fri, Jul 03, 2015 at 10:40:40AM -0400, Matthew Wilcox wrote:
> > From: Matthew Wilcox <willy@linux.intel.com>
> > 
> > DAX relies on the get_block function either zeroing newly allocated blocks
> > before they're findable by subsequent calls to get_block, or marking newly
> > allocated blocks as unwritten.  ext4_get_block() cannot create unwritten
> > extents, but ext4_get_block_write() can.
> 
> To be clear, this patch has no prerequistes or dependencies, right?
> That is, it would be fine if I take this through the ext4 git tree?
> Or is there a reason or a preference for carrying this patch
> somewhere else?

Right, no dependencies or prerequisites, completely independent of all
the other patches.

> Also, is there a way I can test the DAX functionality in ext4 using
> KVM?  If so, can you give me a cheat sheet about how I can do that?

I don't use KVM, but I can tell you what I do ... (additional explanation
added, not for Ted's benefit, but because people less familiar with
Linux than Ted is may happen upon this email for their own purposes).

In /etc/default/grub, I have this line:

GRUB_CMDLINE_LINUX="memmap=4G!4G"

        memmap=nn[KMG]!ss[KMG]
                        [KNL,X86] Mark specific memory as protected.
                        Region of memory to be used, from ss to ss+nn.
                        The memory region may be marked as e820 type 12 (0xc)
                        and is NVDIMM or ADR memory.


In my kernel config, I have:

CONFIG_X86_PMEM_LEGACY=y
CONFIG_LIBNVDIMM=y
CONFIG_BLK_DEV_PMEM=m

At boot, I "modprobe pmem".  On the desktop-class system I'm using as
my development machine, the BIOS doesn't clear RAM between boots (only
power cycles), so the partition table and ext4 filesystem stays good,
and all I have to do is:

mount -odax /dev/pmem0p1 /mnt/ram0/

Also my xfstests local.config:

TEST_DEV=/dev/pmem0p1
TEST_DIR=/mnt/ram0
SCRATCH_DEV=/dev/pmem0p2
SCRATCH_MNT=/mnt/ram1
TEST_FS_MOUNT_OPTS="-o dax"
EXT_MOUNT_OPTIONS="-o dax"
MKFS_OPTIONS="-b4096"

Hope I didn't forget anything.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/6] ext4: Use ext4_get_block_write() for DAX
  2015-07-03 18:48     ` Matthew Wilcox
@ 2015-07-03 19:07       ` Theodore Ts'o
  2015-07-05 13:29         ` Boaz Harrosh
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2015-07-03 19:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matthew Wilcox, linux-fsdevel, linux-kernel, Alexander Viro,
	Andreas Dilger, linux-ext4

On Fri, Jul 03, 2015 at 02:48:24PM -0400, Matthew Wilcox wrote:
> 
> At boot, I "modprobe pmem".

Is there a reason why it's important to build and load pmem as a
module?  If I use CONFIG_BLK_DEV_PMEM=y (which is more convenient
given how I launch my KVM test appliance), should I expect any
problems?

I assume that this won't detect any bugs caused by missing CLFLUSH
instructions, but I assume that when using NVM as a block device, this
isn't much of an issue, as long as we don't care about torn writes?
(How using NVM with metdata checksums, or any checksums for that
matter, seems to be an interesting question --- how do we recover from
a checksum failure after a power failure?)

					- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/6] ext4: Use ext4_get_block_write() for DAX
  2015-07-03 19:07       ` Theodore Ts'o
@ 2015-07-05 13:29         ` Boaz Harrosh
  0 siblings, 0 replies; 5+ messages in thread
From: Boaz Harrosh @ 2015-07-05 13:29 UTC (permalink / raw)
  To: Theodore Ts'o, Matthew Wilcox, Matthew Wilcox, linux-fsdevel,
	linux-kernel, Alexander Viro, Andreas Dilger, linux-ext4

On 07/03/2015 10:07 PM, Theodore Ts'o wrote:
> On Fri, Jul 03, 2015 at 02:48:24PM -0400, Matthew Wilcox wrote:
>>
>> At boot, I "modprobe pmem".
> 
> Is there a reason why it's important to build and load pmem as a
> module?  If I use CONFIG_BLK_DEV_PMEM=y (which is more convenient
> given how I launch my KVM test appliance), should I expect any
> problems?
> 

This (=y) should work fine. We use it a lot. (with KVM even boot an
image with -dax, with a trick)

Note that DAX need not be tested with pmem only, you can always use brd
at any given point without any reboot.

One more trick for xfstest I use:
	memmap=2G!4G,2G!6G

And have two pmem0/1 and don't need to bother with any fdisk. Do need
to mkfs every boot though.

BTW: with kvm a reboot with above memmap will give you back the exact
same memory. halt and "virsh start" is a different story.

> I assume that this won't detect any bugs caused by missing CLFLUSH
> instructions, but I assume that when using NVM as a block device, this
> isn't much of an issue, as long as we don't care about torn writes?
> (How using NVM with metdata checksums, or any checksums for that
> matter, seems to be an interesting question --- how do we recover from
> a checksum failure after a power failure?)
> 

Currently pmem maps a very-(very) slow ioremap_nocache. So any Kernel
memory access should be pmem persistent. For a real world faster ioremap,
there are few major pieces still missing in the stack to make it
persistent. Note the even today with  ioremap_nocache, any application
mmap (like git) is not persistent.

> 					- Ted

Cheers
Boaz


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-07-05 13:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1435934443-17090-1-git-send-email-matthew.r.wilcox@intel.com>
2015-07-03 14:40 ` [PATCH v2 3/6] ext4: Use ext4_get_block_write() for DAX Matthew Wilcox
2015-07-03 18:30   ` Theodore Ts'o
2015-07-03 18:48     ` Matthew Wilcox
2015-07-03 19:07       ` Theodore Ts'o
2015-07-05 13:29         ` Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).