* [PATCH 0/3] add fallback reason strings to DAX PMD path
@ 2017-05-23 21:25 Ross Zwisler
2017-05-23 21:25 ` [PATCH 1/3] dax: add fallback reason to dax_iomap_pmd_fault() Ross Zwisler
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Ross Zwisler @ 2017-05-23 21:25 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Ross Zwisler, Darrick J. Wong, Alexander Viro, Christoph Hellwig,
Dan Williams, Ingo Molnar, Jan Kara, Matthew Wilcox,
Steven Rostedt, linux-fsdevel, linux-nvdimm
One of the primary motivations for adding tracepoints to the DAX PMD path
was to allow the user to diagnose whether their system was actually using
PMDs, and if not to help them understand why. For me at least this has
worked okay in some situations, but many times I find myself adding more
debugging to diagnose fallback reasons that aren't immediately obvious, or
situations where the current tracepoints are simply insufficient because
they don't give you enough information.
This series adds short fallback reason strings to the tracepoints in the
PMD path with the intention of giving the user better information about why
their system is falling back to PTEs.
So, for example, the recent case on my system was that I forgot to move my
namespace from "raw" mode to "memory" mode, which resulted in this output:
big-1046 [000] .... 103.930950: dax_pmd_fault: dev 259:0 ino 0xc shared
WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000
vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400
big-1046 [000] .... 103.931220: dax_pmd_insert_mapping_fallback: dev
259:0 ino 0xc shared write address 0x10505000 length 0x200000 pfn 0x24a200
DEV radix_entry 0x0
big-1046 [000] .... 103.931222: dax_pmd_fault_done: dev 259:0 ino 0xc
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
0x10200000 vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400 FALLBACK
The issue is that the PFN_MAP flag isn't set because we're lacking struct
page for our PMEM namespace. It's not immediately obvious why this
fallback happened, and actually you can't even diagnose it because we mask
off the pfn_t flags before printing the PFN. :(
This new output:
big-1011 [000] .... 36.164708: dax_pmd_fault: dev 259:0 ino 0xc shared
WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000
vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400
big-1011 [000] .... 36.165521: dax_pmd_insert_mapping_fallback: dev
259:0 ino 0xc shared write address 0x10505000 length 0x200000 pfn
0x2000000000249200 DEV radix_entry 0x0 pfn_t not devmap
big-1011 [000] .... 36.165524: dax_pmd_fault_done: dev 259:0 ino 0xc
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
0x10200000 vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400 FALLBACK
adds the "pfn_t not devmap" string to the second line, telling the user
exactly what's going on. I also stopped masking off the pfn_t flags so the
output was more complete.
My only concern is that somehow adding strings like this to tracepoint
output, brief and useful though they may be, is somehow breaking what
tracepoints are supposed to be doing. If anyone feels strongly about this
I guess I can just keep these changes locally and try to keep enhancing the
existing output without adding strings.
Ross Zwisler (3):
dax: add fallback reason to dax_iomap_pmd_fault()
dax: add fallback reason to dax_pmd_insert_mapping()
dax: add fallback reason to dax_pmd_load_hole()
fs/dax.c | 76 +++++++++++++++++++++++++++++++------------
include/trace/events/fs_dax.h | 50 +++++++++++++++++-----------
2 files changed, 86 insertions(+), 40 deletions(-)
--
2.9.4
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] dax: add fallback reason to dax_iomap_pmd_fault()
2017-05-23 21:25 [PATCH 0/3] add fallback reason strings to DAX PMD path Ross Zwisler
@ 2017-05-23 21:25 ` Ross Zwisler
2017-05-23 21:25 ` [PATCH 2/3] dax: add fallback reason to dax_pmd_insert_mapping() Ross Zwisler
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ross Zwisler @ 2017-05-23 21:25 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Ross Zwisler, Darrick J. Wong, Alexander Viro, Christoph Hellwig,
Dan Williams, Ingo Molnar, Jan Kara, Matthew Wilcox,
Steven Rostedt, linux-fsdevel, linux-nvdimm
Currently the tracepoints in dax_iomap_pmd_fault() provide the user with
enough information to diagnose some but not all of the reasons for falling
back to PTEs. Enhance the tracepoints in this function to explicitly tell
the user why the fallback happened. This adds information for previously
undiagnosable failures such as radix tree collisions, and it also makes all
the fallback reasons much more obvious.
Here is an example of this new tracepoint output, where the page fault is
happening in a VMA that is less than 2 MiB in size:
small-1018 [004] .... 77.657433: dax_pmd_fault: dev 259:0 ino 0xc
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start
0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400
small-1018 [004] .... 77.657436: dax_pmd_fault_done: dev 259:0 ino 0xc
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start
0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK beyond
vma
The "beyond vma" text at the end is the new bit, telling us that our PMD
fault would have faulted in addresses beyond the current bounds of the VMA.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
fs/dax.c | 33 ++++++++++++++++++++++++---------
include/trace/events/fs_dax.h | 15 +++++++++------
2 files changed, 33 insertions(+), 15 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index c22eaf1..35b9f86 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1353,6 +1353,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
struct inode *inode = mapping->host;
int result = VM_FAULT_FALLBACK;
struct iomap iomap = { 0 };
+ char *fallback_reason = "";
pgoff_t max_pgoff, pgoff;
void *entry;
loff_t pos;
@@ -1366,17 +1367,22 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
pgoff = linear_page_index(vma, pmd_addr);
max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;
- trace_dax_pmd_fault(inode, vmf, max_pgoff, 0);
+ trace_dax_pmd_fault(inode, vmf, max_pgoff, 0, "");
/* Fall back to PTEs if we're going to COW */
- if (write && !(vma->vm_flags & VM_SHARED))
+ if (write && !(vma->vm_flags & VM_SHARED)) {
+ fallback_reason = "copy on write";
goto fallback;
+ }
/* If the PMD would extend outside the VMA */
- if (pmd_addr < vma->vm_start)
+ if (pmd_addr < vma->vm_start) {
+ fallback_reason = "before vma";
goto fallback;
- if ((pmd_addr + PMD_SIZE) > vma->vm_end)
+ } else if ((pmd_addr + PMD_SIZE) > vma->vm_end) {
+ fallback_reason = "beyond vma";
goto fallback;
+ }
if (pgoff > max_pgoff) {
result = VM_FAULT_SIGBUS;
@@ -1384,8 +1390,10 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
}
/* If the PMD would extend beyond the file size */
- if ((pgoff | PG_PMD_COLOUR) > max_pgoff)
+ if ((pgoff | PG_PMD_COLOUR) > max_pgoff) {
+ fallback_reason = "beyond file";
goto fallback;
+ }
/*
* grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
@@ -1394,8 +1402,10 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
* back to 4k entries.
*/
entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
- if (IS_ERR(entry))
+ if (IS_ERR(entry)) {
+ fallback_reason = "entry lock";
goto fallback;
+ }
/*
* Note that we don't use iomap_apply here. We aren't doing I/O, only
@@ -1404,11 +1414,15 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
*/
pos = (loff_t)pgoff << PAGE_SHIFT;
error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
- if (error)
+ if (error) {
+ fallback_reason = "iomap begin";
goto unlock_entry;
+ }
- if (iomap.offset + iomap.length < pos + PMD_SIZE)
+ if (iomap.offset + iomap.length < pos + PMD_SIZE) {
+ fallback_reason = "beyond iomap";
goto finish_iomap;
+ }
switch (iomap.type) {
case IOMAP_MAPPED:
@@ -1448,7 +1462,8 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf,
count_vm_event(THP_FAULT_FALLBACK);
}
out:
- trace_dax_pmd_fault_done(inode, vmf, max_pgoff, result);
+ trace_dax_pmd_fault_done(inode, vmf, max_pgoff, result,
+ fallback_reason);
return result;
}
#else
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 08bb3ed..fd12f8c 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -8,8 +8,8 @@
DECLARE_EVENT_CLASS(dax_pmd_fault_class,
TP_PROTO(struct inode *inode, struct vm_fault *vmf,
- pgoff_t max_pgoff, int result),
- TP_ARGS(inode, vmf, max_pgoff, result),
+ pgoff_t max_pgoff, int result, char *fallback_reason),
+ TP_ARGS(inode, vmf, max_pgoff, result, fallback_reason),
TP_STRUCT__entry(
__field(unsigned long, ino)
__field(unsigned long, vm_start)
@@ -18,6 +18,7 @@ DECLARE_EVENT_CLASS(dax_pmd_fault_class,
__field(unsigned long, address)
__field(pgoff_t, pgoff)
__field(pgoff_t, max_pgoff)
+ __field(char *, fallback_reason)
__field(dev_t, dev)
__field(unsigned int, flags)
__field(int, result)
@@ -33,9 +34,10 @@ DECLARE_EVENT_CLASS(dax_pmd_fault_class,
__entry->pgoff = vmf->pgoff;
__entry->max_pgoff = max_pgoff;
__entry->result = result;
+ __entry->fallback_reason = fallback_reason;
),
TP_printk("dev %d:%d ino %#lx %s %s address %#lx vm_start "
- "%#lx vm_end %#lx pgoff %#lx max_pgoff %#lx %s",
+ "%#lx vm_end %#lx pgoff %#lx max_pgoff %#lx %s %s",
MAJOR(__entry->dev),
MINOR(__entry->dev),
__entry->ino,
@@ -46,15 +48,16 @@ DECLARE_EVENT_CLASS(dax_pmd_fault_class,
__entry->vm_end,
__entry->pgoff,
__entry->max_pgoff,
- __print_flags(__entry->result, "|", VM_FAULT_RESULT_TRACE)
+ __print_flags(__entry->result, "|", VM_FAULT_RESULT_TRACE),
+ __entry->fallback_reason
)
)
#define DEFINE_PMD_FAULT_EVENT(name) \
DEFINE_EVENT(dax_pmd_fault_class, name, \
TP_PROTO(struct inode *inode, struct vm_fault *vmf, \
- pgoff_t max_pgoff, int result), \
- TP_ARGS(inode, vmf, max_pgoff, result))
+ pgoff_t max_pgoff, int result, char *fallback_reason), \
+ TP_ARGS(inode, vmf, max_pgoff, result, fallback_reason))
DEFINE_PMD_FAULT_EVENT(dax_pmd_fault);
DEFINE_PMD_FAULT_EVENT(dax_pmd_fault_done);
--
2.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] dax: add fallback reason to dax_pmd_insert_mapping()
2017-05-23 21:25 [PATCH 0/3] add fallback reason strings to DAX PMD path Ross Zwisler
2017-05-23 21:25 ` [PATCH 1/3] dax: add fallback reason to dax_iomap_pmd_fault() Ross Zwisler
@ 2017-05-23 21:25 ` Ross Zwisler
2017-05-23 21:26 ` [PATCH 3/3] dax: add fallback reason to dax_pmd_load_hole() Ross Zwisler
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ross Zwisler @ 2017-05-23 21:25 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Ross Zwisler, Darrick J. Wong, Alexander Viro, Christoph Hellwig,
Dan Williams, Ingo Molnar, Jan Kara, Matthew Wilcox,
Steven Rostedt, linux-fsdevel, linux-nvdimm
Currently the tracepoints in dax_pmd_insert_mapping() provide the user with
enough information to diagnose some but not all of the reasons for falling
back to PTEs. Enhance the tracepoints in this function to explicitly tell
the user why the fallback happened. This adds information for previously
undiagnosable failures such as dax_direct_access() failures, and it also
makes all the fallback reasons much more obvious.
Here is an example of this new tracepoint output where the page fault is
happening on a device that is in "raw" mode, and thus doesn't have the
required struct pages to be able to handle PMD faults:
big-1011 [000] .... 36.164708: dax_pmd_fault: dev 259:0 ino 0xc shared
WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000
vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400
big-1011 [000] .... 36.165521: dax_pmd_insert_mapping_fallback: dev
259:0 ino 0xc shared write address 0x10505000 length 0x200000 pfn
0x2000000000249200 DEV radix_entry 0x0 pfn_t not devmap
big-1011 [000] .... 36.165524: dax_pmd_fault_done: dev 259:0 ino 0xc
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
0x10200000 vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400 FALLBACK
The "pfn_t not devmap" text at the end of the second line is the new bit,
telling us that our PFN didn't have both the PFN_DEV and PFN_MAP flags set.
This patch also stops masking off the pfn_t flags from the tracepoint
output, since they are useful when diagnosing some fallbacks.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
fs/dax.c | 28 ++++++++++++++++++++--------
include/trace/events/fs_dax.h | 19 ++++++++++++-------
2 files changed, 32 insertions(+), 15 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 35b9f86..531d235 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1262,43 +1262,55 @@ static int dax_pmd_insert_mapping(struct vm_fault *vmf, struct iomap *iomap,
struct block_device *bdev = iomap->bdev;
struct inode *inode = mapping->host;
const size_t size = PMD_SIZE;
+ char *fallback_reason = "";
void *ret = NULL, *kaddr;
long length = 0;
pgoff_t pgoff;
pfn_t pfn;
int id;
- if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
+ if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0) {
+ fallback_reason = "bad pgoff";
goto fallback;
+ }
id = dax_read_lock();
length = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, &pfn);
- if (length < 0)
+ if (length < 0) {
+ fallback_reason = "direct access";
goto unlock_fallback;
+ }
length = PFN_PHYS(length);
- if (length < size)
+ if (length < size) {
+ fallback_reason = "bad length";
goto unlock_fallback;
- if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
+ } else if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR) {
+ fallback_reason = "pfn unaligned";
goto unlock_fallback;
- if (!pfn_t_devmap(pfn))
+ } else if (!pfn_t_devmap(pfn)) {
+ fallback_reason = "pfn_t not devmap";
goto unlock_fallback;
+ }
dax_read_unlock(id);
ret = dax_insert_mapping_entry(mapping, vmf, *entryp, sector,
RADIX_DAX_PMD);
- if (IS_ERR(ret))
+ if (IS_ERR(ret)) {
+ fallback_reason = "insert mapping";
goto fallback;
+ }
*entryp = ret;
- trace_dax_pmd_insert_mapping(inode, vmf, length, pfn, ret);
+ trace_dax_pmd_insert_mapping(inode, vmf, length, pfn, ret, "");
return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd,
pfn, vmf->flags & FAULT_FLAG_WRITE);
unlock_fallback:
dax_read_unlock(id);
fallback:
- trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
+ trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret,
+ fallback_reason);
return VM_FAULT_FALLBACK;
}
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index fd12f8c..2263029 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -106,8 +106,9 @@ DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole_fallback);
DECLARE_EVENT_CLASS(dax_pmd_insert_mapping_class,
TP_PROTO(struct inode *inode, struct vm_fault *vmf,
- long length, pfn_t pfn, void *radix_entry),
- TP_ARGS(inode, vmf, length, pfn, radix_entry),
+ long length, pfn_t pfn, void *radix_entry,
+ char *fallback_reason),
+ TP_ARGS(inode, vmf, length, pfn, radix_entry, fallback_reason),
TP_STRUCT__entry(
__field(unsigned long, ino)
__field(unsigned long, vm_flags)
@@ -115,6 +116,7 @@ DECLARE_EVENT_CLASS(dax_pmd_insert_mapping_class,
__field(long, length)
__field(u64, pfn_val)
__field(void *, radix_entry)
+ __field(char *, fallback_reason)
__field(dev_t, dev)
__field(int, write)
),
@@ -127,9 +129,10 @@ DECLARE_EVENT_CLASS(dax_pmd_insert_mapping_class,
__entry->length = length;
__entry->pfn_val = pfn.val;
__entry->radix_entry = radix_entry;
+ __entry->fallback_reason = fallback_reason;
),
TP_printk("dev %d:%d ino %#lx %s %s address %#lx length %#lx "
- "pfn %#llx %s radix_entry %#lx",
+ "pfn %#llx %s radix_entry %#lx %s",
MAJOR(__entry->dev),
MINOR(__entry->dev),
__entry->ino,
@@ -137,18 +140,20 @@ DECLARE_EVENT_CLASS(dax_pmd_insert_mapping_class,
__entry->write ? "write" : "read",
__entry->address,
__entry->length,
- __entry->pfn_val & ~PFN_FLAGS_MASK,
+ __entry->pfn_val,
__print_flags_u64(__entry->pfn_val & PFN_FLAGS_MASK, "|",
PFN_FLAGS_TRACE),
- (unsigned long)__entry->radix_entry
+ (unsigned long)__entry->radix_entry,
+ __entry->fallback_reason
)
)
#define DEFINE_PMD_INSERT_MAPPING_EVENT(name) \
DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \
TP_PROTO(struct inode *inode, struct vm_fault *vmf, \
- long length, pfn_t pfn, void *radix_entry), \
- TP_ARGS(inode, vmf, length, pfn, radix_entry))
+ long length, pfn_t pfn, void *radix_entry, \
+ char *fallback_reason), \
+ TP_ARGS(inode, vmf, length, pfn, radix_entry, fallback_reason))
DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping);
DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback);
--
2.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] dax: add fallback reason to dax_pmd_load_hole()
2017-05-23 21:25 [PATCH 0/3] add fallback reason strings to DAX PMD path Ross Zwisler
2017-05-23 21:25 ` [PATCH 1/3] dax: add fallback reason to dax_iomap_pmd_fault() Ross Zwisler
2017-05-23 21:25 ` [PATCH 2/3] dax: add fallback reason to dax_pmd_insert_mapping() Ross Zwisler
@ 2017-05-23 21:26 ` Ross Zwisler
2017-05-24 5:22 ` [PATCH 0/3] add fallback reason strings to DAX PMD path Masayoshi Mizuma
2017-05-24 9:03 ` Jan Kara
4 siblings, 0 replies; 6+ messages in thread
From: Ross Zwisler @ 2017-05-23 21:26 UTC (permalink / raw)
To: Andrew Morton, linux-kernel
Cc: Ross Zwisler, Darrick J. Wong, Alexander Viro, Christoph Hellwig,
Dan Williams, Ingo Molnar, Jan Kara, Matthew Wilcox,
Steven Rostedt, linux-fsdevel, linux-nvdimm
Currently the tracepoints in dax_pmd_load_hole() provide the user with
enough information to diagnose some but not all of the reasons for falling
back to PTEs. Enhance the tracepoints in this function to explicitly tell
the user why the fallback happened. This adds information for previously
undiagnosable failures such as PMD collisions, and it also makes all the
fallback reasons much more obvious.
Here is an example of this new tracepoint output where the page fault found
that another PMD had been inserted into our page tables:
read_big-1138 [004] .... 202.093800: dax_pmd_fault: dev 259:0 ino 0xc
shared ALLOW_RETRY|KILLABLE|USER address 0x10400000 vm_start 0x10200000
vm_end 0x10600000 pgoff 0x200 max_pgoff 0x1400
read_big-1138 [004] .... 202.094002: dax_pmd_load_hole_fallback: dev
259:0 ino 0xc shared address 0x10400000 zero_page ffffea0007758000
radix_entry 0x1e pmd collision
read_big-1138 [004] .... 202.094004: dax_pmd_fault_done: dev 259:0 ino
0xc shared ALLOW_RETRY|KILLABLE|USER address 0x10400000 vm_start 0x10200000
vm_end 0x10600000 pgoff 0x200 max_pgoff 0x1400 FALLBACK
The "pmd collision" text at the end of the second line is the new bit,
telling us why dax_pmd_load_hole() failed.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
fs/dax.c | 15 +++++++++++----
include/trace/events/fs_dax.h | 16 ++++++++++------
2 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 531d235..45e22cd 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1320,6 +1320,7 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
unsigned long pmd_addr = vmf->address & PMD_MASK;
struct inode *inode = mapping->host;
+ char *fallback_reason = "";
struct page *zero_page;
void *ret = NULL;
spinlock_t *ptl;
@@ -1327,17 +1328,22 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
zero_page = mm_get_huge_zero_page(vmf->vma->vm_mm);
- if (unlikely(!zero_page))
+ if (unlikely(!zero_page)) {
+ fallback_reason = "no zero page";
goto fallback;
+ }
ret = dax_insert_mapping_entry(mapping, vmf, *entryp, 0,
RADIX_DAX_PMD | RADIX_DAX_HZP);
- if (IS_ERR(ret))
+ if (IS_ERR(ret)) {
+ fallback_reason = "insert mapping";
goto fallback;
+ }
*entryp = ret;
ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
if (!pmd_none(*(vmf->pmd))) {
+ fallback_reason = "pmd collision";
spin_unlock(ptl);
goto fallback;
}
@@ -1346,11 +1352,12 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
pmd_entry = pmd_mkhuge(pmd_entry);
set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
spin_unlock(ptl);
- trace_dax_pmd_load_hole(inode, vmf, zero_page, ret);
+ trace_dax_pmd_load_hole(inode, vmf, zero_page, ret, "");
return VM_FAULT_NOPAGE;
fallback:
- trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, ret);
+ trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, ret,
+ fallback_reason);
return VM_FAULT_FALLBACK;
}
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 2263029..e8d7494 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -65,14 +65,15 @@ DEFINE_PMD_FAULT_EVENT(dax_pmd_fault_done);
DECLARE_EVENT_CLASS(dax_pmd_load_hole_class,
TP_PROTO(struct inode *inode, struct vm_fault *vmf,
struct page *zero_page,
- void *radix_entry),
- TP_ARGS(inode, vmf, zero_page, radix_entry),
+ void *radix_entry, char *fallback_reason),
+ TP_ARGS(inode, vmf, zero_page, radix_entry, fallback_reason),
TP_STRUCT__entry(
__field(unsigned long, ino)
__field(unsigned long, vm_flags)
__field(unsigned long, address)
__field(struct page *, zero_page)
__field(void *, radix_entry)
+ __field(char *, fallback_reason)
__field(dev_t, dev)
),
TP_fast_assign(
@@ -82,24 +83,27 @@ DECLARE_EVENT_CLASS(dax_pmd_load_hole_class,
__entry->address = vmf->address;
__entry->zero_page = zero_page;
__entry->radix_entry = radix_entry;
+ __entry->fallback_reason = fallback_reason;
),
TP_printk("dev %d:%d ino %#lx %s address %#lx zero_page %p "
- "radix_entry %#lx",
+ "radix_entry %#lx %s",
MAJOR(__entry->dev),
MINOR(__entry->dev),
__entry->ino,
__entry->vm_flags & VM_SHARED ? "shared" : "private",
__entry->address,
__entry->zero_page,
- (unsigned long)__entry->radix_entry
+ (unsigned long)__entry->radix_entry,
+ __entry->fallback_reason
)
)
#define DEFINE_PMD_LOAD_HOLE_EVENT(name) \
DEFINE_EVENT(dax_pmd_load_hole_class, name, \
TP_PROTO(struct inode *inode, struct vm_fault *vmf, \
- struct page *zero_page, void *radix_entry), \
- TP_ARGS(inode, vmf, zero_page, radix_entry))
+ struct page *zero_page, void *radix_entry, \
+ char *fallback_reason), \
+ TP_ARGS(inode, vmf, zero_page, radix_entry, fallback_reason))
DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole);
DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole_fallback);
--
2.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] add fallback reason strings to DAX PMD path
2017-05-23 21:25 [PATCH 0/3] add fallback reason strings to DAX PMD path Ross Zwisler
` (2 preceding siblings ...)
2017-05-23 21:26 ` [PATCH 3/3] dax: add fallback reason to dax_pmd_load_hole() Ross Zwisler
@ 2017-05-24 5:22 ` Masayoshi Mizuma
2017-05-24 9:03 ` Jan Kara
4 siblings, 0 replies; 6+ messages in thread
From: Masayoshi Mizuma @ 2017-05-24 5:22 UTC (permalink / raw)
To: Ross Zwisler
Cc: Andrew Morton, linux-kernel, Jan Kara, Darrick J. Wong,
Matthew Wilcox, Steven Rostedt, linux-nvdimm, Ingo Molnar,
Alexander Viro, linux-fsdevel, Christoph Hellwig
Hi Ross,
On Tue, 23 May 2017 15:25:57 -0600 Ross Zwisler wrote:
> One of the primary motivations for adding tracepoints to the DAX PMD path
> was to allow the user to diagnose whether their system was actually using
> PMDs, and if not to help them understand why. For me at least this has
> worked okay in some situations, but many times I find myself adding more
> debugging to diagnose fallback reasons that aren't immediately obvious, or
> situations where the current tracepoints are simply insufficient because
> they don't give you enough information.
>
> This series adds short fallback reason strings to the tracepoints in the
> PMD path with the intention of giving the user better information about why
> their system is falling back to PTEs.
>
> So, for example, the recent case on my system was that I forgot to move my
> namespace from "raw" mode to "memory" mode, which resulted in this output:
>
> big-1046 [000] .... 103.930950: dax_pmd_fault: dev 259:0 ino 0xc shared
> WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000
> vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400
>
> big-1046 [000] .... 103.931220: dax_pmd_insert_mapping_fallback: dev
> 259:0 ino 0xc shared write address 0x10505000 length 0x200000 pfn 0x24a200
> DEV radix_entry 0x0
>
> big-1046 [000] .... 103.931222: dax_pmd_fault_done: dev 259:0 ino 0xc
> shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
> 0x10200000 vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400 FALLBACK
>
> The issue is that the PFN_MAP flag isn't set because we're lacking struct
> page for our PMEM namespace. It's not immediately obvious why this
> fallback happened, and actually you can't even diagnose it because we mask
> off the pfn_t flags before printing the PFN. :(
>
> This new output:
>
> big-1011 [000] .... 36.164708: dax_pmd_fault: dev 259:0 ino 0xc shared
> WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000
> vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400
>
> big-1011 [000] .... 36.165521: dax_pmd_insert_mapping_fallback: dev
> 259:0 ino 0xc shared write address 0x10505000 length 0x200000 pfn
> 0x2000000000249200 DEV radix_entry 0x0 pfn_t not devmap
>
> big-1011 [000] .... 36.165524: dax_pmd_fault_done: dev 259:0 ino 0xc
> shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
> 0x10200000 vm_end 0x10700000 pgoff 0x305 max_pgoff 0x1400 FALLBACK
>
> adds the "pfn_t not devmap" string to the second line, telling the user
> exactly what's going on. I also stopped masking off the pfn_t flags so the
> output was more complete.
I think this idea is good. How about adding a suffix to be easy for grep?
For example the suffix, DAX_FAULT_.
And, to simplify the code, how about moving the string of fault reason
to include/trace/events/fs_dax.h, like as include/trace/events/kvm.h?
Regards,
Masayoshi Mizuma
>
> My only concern is that somehow adding strings like this to tracepoint
> output, brief and useful though they may be, is somehow breaking what
> tracepoints are supposed to be doing. If anyone feels strongly about this
> I guess I can just keep these changes locally and try to keep enhancing the
> existing output without adding strings.
>
> Ross Zwisler (3):
> dax: add fallback reason to dax_iomap_pmd_fault()
> dax: add fallback reason to dax_pmd_insert_mapping()
> dax: add fallback reason to dax_pmd_load_hole()
>
> fs/dax.c | 76 +++++++++++++++++++++++++++++++------------
> include/trace/events/fs_dax.h | 50 +++++++++++++++++-----------
> 2 files changed, 86 insertions(+), 40 deletions(-)
>
> --
> 2.9.4
>
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] add fallback reason strings to DAX PMD path
2017-05-23 21:25 [PATCH 0/3] add fallback reason strings to DAX PMD path Ross Zwisler
` (3 preceding siblings ...)
2017-05-24 5:22 ` [PATCH 0/3] add fallback reason strings to DAX PMD path Masayoshi Mizuma
@ 2017-05-24 9:03 ` Jan Kara
4 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2017-05-24 9:03 UTC (permalink / raw)
To: Ross Zwisler
Cc: Andrew Morton, linux-kernel, Darrick J. Wong, Alexander Viro,
Christoph Hellwig, Dan Williams, Ingo Molnar, Jan Kara,
Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-nvdimm
On Tue 23-05-17 15:25:57, Ross Zwisler wrote:
> One of the primary motivations for adding tracepoints to the DAX PMD path
> was to allow the user to diagnose whether their system was actually using
> PMDs, and if not to help them understand why. For me at least this has
> worked okay in some situations, but many times I find myself adding more
> debugging to diagnose fallback reasons that aren't immediately obvious, or
> situations where the current tracepoints are simply insufficient because
> they don't give you enough information.
>
> This series adds short fallback reason strings to the tracepoints in the
> PMD path with the intention of giving the user better information about why
> their system is falling back to PTEs.
So I don't like adding strings into tracepoints. It just seems too close to
debug printk. After all if you need detailed information for debugging some
particular bug, you can always add kprobe tracepoint at particular
instruction in the function.
Probably my objection boils down to the fact that I don't think tracepoints
in DAX code are for a normal user. They expose lot of details about the
implementation and once users start depending on a particular tracepoint,
it becomes part of the kernel API and cannot be changed which will be
very inconvenient for DAX implementation. I think these tracepoints are for
kernel developers to be able to diagnose what's going on and I'm willing to
impose some thinking burden on those ;).
Also looking at the reasons you've added most of them can already be
detected from the information output by the tracepoint so it seems mostly
like a duplicit information. So my opinion is: If you need more information
like whether pfn is from devmap, make that visible in the tracepoint (like
add pfn flags to the output). Sure the reason for fallback isn't
immediately visible but if you really wonder, you go through conditions in
the code, verify them against the data from the tracepoint and you should
be able to find the reason... If there are some common fallback paths that
you'd think are worth detecting and it's not easy to add information to the
final tracepoint, then we can add a specific tracepoint to that branch.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-05-24 9:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-23 21:25 [PATCH 0/3] add fallback reason strings to DAX PMD path Ross Zwisler
2017-05-23 21:25 ` [PATCH 1/3] dax: add fallback reason to dax_iomap_pmd_fault() Ross Zwisler
2017-05-23 21:25 ` [PATCH 2/3] dax: add fallback reason to dax_pmd_insert_mapping() Ross Zwisler
2017-05-23 21:26 ` [PATCH 3/3] dax: add fallback reason to dax_pmd_load_hole() Ross Zwisler
2017-05-24 5:22 ` [PATCH 0/3] add fallback reason strings to DAX PMD path Masayoshi Mizuma
2017-05-24 9:03 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).