* [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode
@ 2018-04-05 12:11 Brian Foster
2018-04-05 12:11 ` [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount Brian Foster
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Brian Foster @ 2018-04-05 12:11 UTC (permalink / raw)
To: linux-xfs
A test case to reproduce a filestream/MRU use-after-free of a
reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be
reset/reused once the inode memory is freed. This normally only
occurs when a new page is cycled into the zone, however.
Perform the "one-time" inode init immediately prior to freeing
inodes when in DEBUG mode. This will zero the inode, init the low
level structures (locks, lists, etc.) and otherwise ensure each
inode is in a purely uninitialized state while sitting in the zone
as free memory.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
I'll post a test that depends on this once this is worked out... one
concern this raised is if we consider any future bugs in the inode
initialization code (suppose we initialize some field once that should
be initialized on every allocation, for example), this code has the
potential to suppress such problems in debug mode. So an alternative to
this approach is to perhaps tie this to an errortag and let the
associated xfstests test enable it appropriately. Thoughts or
preferences?
Brian
fs/xfs/xfs_icache.c | 5 ++++-
fs/xfs/xfs_super.c | 2 +-
fs/xfs/xfs_super.h | 1 +
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 9a18f69f6e96..86dc4c8a4e1d 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -111,7 +111,10 @@ xfs_inode_free_callback(
xfs_inode_item_destroy(ip);
ip->i_itemp = NULL;
}
-
+#ifdef DEBUG
+ /* facilitate catching use-after-free problems */
+ xfs_fs_inode_init_once(ip);
+#endif
kmem_zone_free(xfs_inode_zone, ip);
}
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 612c1d5348b3..29b1be5dfebf 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1030,7 +1030,7 @@ xfs_fs_dirty_inode(
* fields in the xfs inode that left in the initialise state
* when freeing the inode.
*/
-STATIC void
+void
xfs_fs_inode_init_once(
void *inode)
{
diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h
index 8cee8e8050e3..aae8a778f378 100644
--- a/fs/xfs/xfs_super.h
+++ b/fs/xfs/xfs_super.h
@@ -70,6 +70,7 @@ struct block_device;
extern void xfs_quiesce_attr(struct xfs_mount *mp);
extern void xfs_flush_inodes(struct xfs_mount *mp);
+extern void xfs_fs_inode_init_once(void *);
extern void xfs_blkdev_issue_flush(struct xfs_buftarg *);
extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *,
xfs_agnumber_t agcount);
--
2.13.6
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount 2018-04-05 12:11 [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Brian Foster @ 2018-04-05 12:11 ` Brian Foster 2018-04-05 17:18 ` Christoph Hellwig 2018-04-05 16:51 ` [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Darrick J. Wong 2018-04-05 21:14 ` Dave Chinner 2 siblings, 1 reply; 13+ messages in thread From: Brian Foster @ 2018-04-05 12:11 UTC (permalink / raw) To: linux-xfs The filestreams allocator stores an xfs_fstrm_item structure in the MRU to cache inode number to agno mappings for a particular length of time. Each xfs_fstrm_item contains the internal MRU structure, an inode pointer and agno value. The inode pointer stored in the xfs_fstrm_item is not referenced, however, which means the inode itself can be removed and reclaimed before the MRU item is freed. If this occurs, xfs_fstrm_free_func() can access freed or unrelated memory through xfs_fstrm_item->ip and crash. The obvious solution is to grab an inode reference for xfs_fstrm_item. The filestream mechanism only actually uses the inode pointer as a means to access the xfs_mount, however. Rather than add unnecessary complexity, simplify the implementation to store an xfs_mount pointer instead of the inode. This also requires updates to the tracepoint class to provide the associated data via parameters rather than the inode and a minor hack to peek at the MRU key to establish the inode number at free time. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_filestream.c | 12 ++++++------ fs/xfs/xfs_trace.h | 14 +++++++------- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c index 043ca3808ea2..585f4f25fbe5 100644 --- a/fs/xfs/xfs_filestream.c +++ b/fs/xfs/xfs_filestream.c @@ -34,7 +34,7 @@ struct xfs_fstrm_item { struct xfs_mru_cache_elem mru; - struct xfs_inode *ip; + struct xfs_mount *mp; xfs_agnumber_t ag; /* AG in use for this directory */ }; @@ -127,9 +127,9 @@ xfs_fstrm_free_func( struct xfs_fstrm_item *item = container_of(mru, struct xfs_fstrm_item, mru); - xfs_filestream_put_ag(item->ip->i_mount, item->ag); + xfs_filestream_put_ag(item->mp, item->ag); - trace_xfs_filestream_free(item->ip, item->ag); + trace_xfs_filestream_free(item->mp, mru->key, item->ag); kmem_free(item); } @@ -165,7 +165,7 @@ xfs_filestream_pick_ag( trylock = XFS_ALLOC_FLAG_TRYLOCK; for (nscan = 0; 1; nscan++) { - trace_xfs_filestream_scan(ip, ag); + trace_xfs_filestream_scan(mp, ip->i_ino, ag); pag = xfs_perag_get(mp, ag); @@ -265,7 +265,7 @@ xfs_filestream_pick_ag( goto out_put_ag; item->ag = *agp; - item->ip = ip; + item->mp = mp; err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru); if (err) { @@ -333,7 +333,7 @@ xfs_filestream_lookup_ag( ag = container_of(mru, struct xfs_fstrm_item, mru)->ag; xfs_mru_cache_done(mp->m_filestream); - trace_xfs_filestream_lookup(ip, ag); + trace_xfs_filestream_lookup(mp, ip->i_ino, ag); goto out; } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index a982c0b623d0..8955254b900e 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -506,8 +506,8 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release); DEFINE_BUF_ITEM_EVENT(xfs_trans_binval); DECLARE_EVENT_CLASS(xfs_filestream_class, - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), - TP_ARGS(ip, agno), + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), + TP_ARGS(mp, ino, agno), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_ino_t, ino) @@ -515,10 +515,10 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, __field(int, streams) ), TP_fast_assign( - __entry->dev = VFS_I(ip)->i_sb->s_dev; - __entry->ino = ip->i_ino; + __entry->dev = mp->m_super->s_dev; + __entry->ino = ino; __entry->agno = agno; - __entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno); + __entry->streams = xfs_filestream_peek_ag(mp, agno); ), TP_printk("dev %d:%d ino 0x%llx agno %u streams %d", MAJOR(__entry->dev), MINOR(__entry->dev), @@ -528,8 +528,8 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, ) #define DEFINE_FILESTREAM_EVENT(name) \ DEFINE_EVENT(xfs_filestream_class, name, \ - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), \ - TP_ARGS(ip, agno)) + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), \ + TP_ARGS(mp, ino, agno)) DEFINE_FILESTREAM_EVENT(xfs_filestream_free); DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup); DEFINE_FILESTREAM_EVENT(xfs_filestream_scan); -- 2.13.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount 2018-04-05 12:11 ` [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount Brian Foster @ 2018-04-05 17:18 ` Christoph Hellwig 2018-04-05 18:13 ` Brian Foster 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2018-04-05 17:18 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Thu, Apr 05, 2018 at 08:11:47AM -0400, Brian Foster wrote: > The obvious solution is to grab an inode reference for > xfs_fstrm_item. The filestream mechanism only actually uses the > inode pointer as a means to access the xfs_mount, however. Rather > than add unnecessary complexity, simplify the implementation to > store an xfs_mount pointer instead of the inode. This also requires > updates to the tracepoint class to provide the associated data via > parameters rather than the inode and a minor hack to peek at the MRU > key to establish the inode number at free time. How about not replacing it all and just providing the context from the mru cache? Something like the untested patch below: diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c index 043ca3808ea2..5131a6e25fc9 100644 --- a/fs/xfs/xfs_filestream.c +++ b/fs/xfs/xfs_filestream.c @@ -34,7 +34,6 @@ struct xfs_fstrm_item { struct xfs_mru_cache_elem mru; - struct xfs_inode *ip; xfs_agnumber_t ag; /* AG in use for this directory */ }; @@ -122,14 +121,15 @@ xfs_filestream_put_ag( static void xfs_fstrm_free_func( + void *data, struct xfs_mru_cache_elem *mru) { + struct xfs_mount *mp = data; struct xfs_fstrm_item *item = container_of(mru, struct xfs_fstrm_item, mru); - xfs_filestream_put_ag(item->ip->i_mount, item->ag); - - trace_xfs_filestream_free(item->ip, item->ag); + xfs_filestream_put_ag(mp, item->ag); + trace_xfs_filestream_free(mp, mru->key, item->ag); kmem_free(item); } @@ -165,7 +165,7 @@ xfs_filestream_pick_ag( trylock = XFS_ALLOC_FLAG_TRYLOCK; for (nscan = 0; 1; nscan++) { - trace_xfs_filestream_scan(ip, ag); + trace_xfs_filestream_scan(mp, ip->i_ino, ag); pag = xfs_perag_get(mp, ag); @@ -265,7 +265,6 @@ xfs_filestream_pick_ag( goto out_put_ag; item->ag = *agp; - item->ip = ip; err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru); if (err) { @@ -333,7 +332,7 @@ xfs_filestream_lookup_ag( ag = container_of(mru, struct xfs_fstrm_item, mru)->ag; xfs_mru_cache_done(mp->m_filestream); - trace_xfs_filestream_lookup(ip, ag); + trace_xfs_filestream_lookup(mp, ip->i_ino, ag); goto out; } @@ -399,7 +398,7 @@ xfs_filestream_new_ag( * Only free the item here so we skip over the old AG earlier. */ if (mru) - xfs_fstrm_free_func(mru); + xfs_fstrm_free_func(mp, mru); IRELE(pip); exit: @@ -426,8 +425,8 @@ xfs_filestream_mount( * timer tunable to within about 10 percent. This requires at least 10 * groups. */ - return xfs_mru_cache_create(&mp->m_filestream, xfs_fstrm_centisecs * 10, - 10, xfs_fstrm_free_func); + return xfs_mru_cache_create(&mp->m_filestream, mp, + xfs_fstrm_centisecs * 10, 10, xfs_fstrm_free_func); } void diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c index f8a674d7f092..70eea7ae2876 100644 --- a/fs/xfs/xfs_mru_cache.c +++ b/fs/xfs/xfs_mru_cache.c @@ -112,6 +112,7 @@ struct xfs_mru_cache { xfs_mru_cache_free_func_t free_func; /* Function pointer for freeing. */ struct delayed_work work; /* Workqueue data for reaping. */ unsigned int queued; /* work has been queued */ + void *data; }; static struct workqueue_struct *xfs_mru_reap_wq; @@ -259,7 +260,7 @@ _xfs_mru_cache_clear_reap_list( list_for_each_entry_safe(elem, next, &tmp, list_node) { list_del_init(&elem->list_node); - mru->free_func(elem); + mru->free_func(mru->data, elem); } spin_lock(&mru->lock); @@ -326,6 +327,7 @@ xfs_mru_cache_uninit(void) int xfs_mru_cache_create( struct xfs_mru_cache **mrup, + void *data, unsigned int lifetime_ms, unsigned int grp_count, xfs_mru_cache_free_func_t free_func) @@ -369,7 +371,7 @@ xfs_mru_cache_create( mru->grp_time = grp_time; mru->free_func = free_func; - + mru->data = data; *mrup = mru; exit: @@ -492,7 +494,7 @@ xfs_mru_cache_delete( elem = xfs_mru_cache_remove(mru, key); if (elem) - mru->free_func(elem); + mru->free_func(mru->data, elem); } /* diff --git a/fs/xfs/xfs_mru_cache.h b/fs/xfs/xfs_mru_cache.h index fb5245ba5ff7..b3f3fbdfcc47 100644 --- a/fs/xfs/xfs_mru_cache.h +++ b/fs/xfs/xfs_mru_cache.h @@ -26,13 +26,13 @@ struct xfs_mru_cache_elem { }; /* Function pointer type for callback to free a client's data pointer. */ -typedef void (*xfs_mru_cache_free_func_t)(struct xfs_mru_cache_elem *elem); +typedef void (*xfs_mru_cache_free_func_t)(void *, struct xfs_mru_cache_elem *); int xfs_mru_cache_init(void); void xfs_mru_cache_uninit(void); -int xfs_mru_cache_create(struct xfs_mru_cache **mrup, unsigned int lifetime_ms, - unsigned int grp_count, - xfs_mru_cache_free_func_t free_func); +int xfs_mru_cache_create(struct xfs_mru_cache **mrup, void *data, + unsigned int lifetime_ms, unsigned int grp_count, + xfs_mru_cache_free_func_t free_func); void xfs_mru_cache_destroy(struct xfs_mru_cache *mru); int xfs_mru_cache_insert(struct xfs_mru_cache *mru, unsigned long key, struct xfs_mru_cache_elem *elem); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 945de08af7ba..19f4e30265c2 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -506,8 +506,8 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release); DEFINE_BUF_ITEM_EVENT(xfs_trans_binval); DECLARE_EVENT_CLASS(xfs_filestream_class, - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), - TP_ARGS(ip, agno), + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), + TP_ARGS(mp, ino, agno), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_ino_t, ino) @@ -515,10 +515,10 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, __field(int, streams) ), TP_fast_assign( - __entry->dev = VFS_I(ip)->i_sb->s_dev; - __entry->ino = ip->i_ino; + __entry->dev = mp->m_super->s_dev; + __entry->ino = ino; __entry->agno = agno; - __entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno); + __entry->streams = xfs_filestream_peek_ag(mp, agno); ), TP_printk("dev %d:%d ino 0x%llx agno %u streams %d", MAJOR(__entry->dev), MINOR(__entry->dev), @@ -528,8 +528,8 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, ) #define DEFINE_FILESTREAM_EVENT(name) \ DEFINE_EVENT(xfs_filestream_class, name, \ - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), \ - TP_ARGS(ip, agno)) + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), \ + TP_ARGS(mp, ino, agno)) DEFINE_FILESTREAM_EVENT(xfs_filestream_free); DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup); DEFINE_FILESTREAM_EVENT(xfs_filestream_scan); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount 2018-04-05 17:18 ` Christoph Hellwig @ 2018-04-05 18:13 ` Brian Foster 2018-04-06 6:59 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Brian Foster @ 2018-04-05 18:13 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On Thu, Apr 05, 2018 at 10:18:42AM -0700, Christoph Hellwig wrote: > On Thu, Apr 05, 2018 at 08:11:47AM -0400, Brian Foster wrote: > > The obvious solution is to grab an inode reference for > > xfs_fstrm_item. The filestream mechanism only actually uses the > > inode pointer as a means to access the xfs_mount, however. Rather > > than add unnecessary complexity, simplify the implementation to > > store an xfs_mount pointer instead of the inode. This also requires > > updates to the tracepoint class to provide the associated data via > > parameters rather than the inode and a minor hack to peek at the MRU > > key to establish the inode number at free time. > > How about not replacing it all and just providing the context from > the mru cache? Something like the untested patch below: > I actually considered something like this briefly but it initially looked more involved than what you've come up with here. I think that I just didn't consider storing the mp directly in the xfs_mru_cache so it would be available to the reaper context. With that, this looks nicer to me. I'll give it a whirl.. thanks! Brian > diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c > index 043ca3808ea2..5131a6e25fc9 100644 > --- a/fs/xfs/xfs_filestream.c > +++ b/fs/xfs/xfs_filestream.c > @@ -34,7 +34,6 @@ > > struct xfs_fstrm_item { > struct xfs_mru_cache_elem mru; > - struct xfs_inode *ip; > xfs_agnumber_t ag; /* AG in use for this directory */ > }; > > @@ -122,14 +121,15 @@ xfs_filestream_put_ag( > > static void > xfs_fstrm_free_func( > + void *data, > struct xfs_mru_cache_elem *mru) > { > + struct xfs_mount *mp = data; > struct xfs_fstrm_item *item = > container_of(mru, struct xfs_fstrm_item, mru); > > - xfs_filestream_put_ag(item->ip->i_mount, item->ag); > - > - trace_xfs_filestream_free(item->ip, item->ag); > + xfs_filestream_put_ag(mp, item->ag); > + trace_xfs_filestream_free(mp, mru->key, item->ag); > > kmem_free(item); > } > @@ -165,7 +165,7 @@ xfs_filestream_pick_ag( > trylock = XFS_ALLOC_FLAG_TRYLOCK; > > for (nscan = 0; 1; nscan++) { > - trace_xfs_filestream_scan(ip, ag); > + trace_xfs_filestream_scan(mp, ip->i_ino, ag); > > pag = xfs_perag_get(mp, ag); > > @@ -265,7 +265,6 @@ xfs_filestream_pick_ag( > goto out_put_ag; > > item->ag = *agp; > - item->ip = ip; > > err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru); > if (err) { > @@ -333,7 +332,7 @@ xfs_filestream_lookup_ag( > ag = container_of(mru, struct xfs_fstrm_item, mru)->ag; > xfs_mru_cache_done(mp->m_filestream); > > - trace_xfs_filestream_lookup(ip, ag); > + trace_xfs_filestream_lookup(mp, ip->i_ino, ag); > goto out; > } > > @@ -399,7 +398,7 @@ xfs_filestream_new_ag( > * Only free the item here so we skip over the old AG earlier. > */ > if (mru) > - xfs_fstrm_free_func(mru); > + xfs_fstrm_free_func(mp, mru); > > IRELE(pip); > exit: > @@ -426,8 +425,8 @@ xfs_filestream_mount( > * timer tunable to within about 10 percent. This requires at least 10 > * groups. > */ > - return xfs_mru_cache_create(&mp->m_filestream, xfs_fstrm_centisecs * 10, > - 10, xfs_fstrm_free_func); > + return xfs_mru_cache_create(&mp->m_filestream, mp, > + xfs_fstrm_centisecs * 10, 10, xfs_fstrm_free_func); > } > > void > diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c > index f8a674d7f092..70eea7ae2876 100644 > --- a/fs/xfs/xfs_mru_cache.c > +++ b/fs/xfs/xfs_mru_cache.c > @@ -112,6 +112,7 @@ struct xfs_mru_cache { > xfs_mru_cache_free_func_t free_func; /* Function pointer for freeing. */ > struct delayed_work work; /* Workqueue data for reaping. */ > unsigned int queued; /* work has been queued */ > + void *data; > }; > > static struct workqueue_struct *xfs_mru_reap_wq; > @@ -259,7 +260,7 @@ _xfs_mru_cache_clear_reap_list( > > list_for_each_entry_safe(elem, next, &tmp, list_node) { > list_del_init(&elem->list_node); > - mru->free_func(elem); > + mru->free_func(mru->data, elem); > } > > spin_lock(&mru->lock); > @@ -326,6 +327,7 @@ xfs_mru_cache_uninit(void) > int > xfs_mru_cache_create( > struct xfs_mru_cache **mrup, > + void *data, > unsigned int lifetime_ms, > unsigned int grp_count, > xfs_mru_cache_free_func_t free_func) > @@ -369,7 +371,7 @@ xfs_mru_cache_create( > > mru->grp_time = grp_time; > mru->free_func = free_func; > - > + mru->data = data; > *mrup = mru; > > exit: > @@ -492,7 +494,7 @@ xfs_mru_cache_delete( > > elem = xfs_mru_cache_remove(mru, key); > if (elem) > - mru->free_func(elem); > + mru->free_func(mru->data, elem); > } > > /* > diff --git a/fs/xfs/xfs_mru_cache.h b/fs/xfs/xfs_mru_cache.h > index fb5245ba5ff7..b3f3fbdfcc47 100644 > --- a/fs/xfs/xfs_mru_cache.h > +++ b/fs/xfs/xfs_mru_cache.h > @@ -26,13 +26,13 @@ struct xfs_mru_cache_elem { > }; > > /* Function pointer type for callback to free a client's data pointer. */ > -typedef void (*xfs_mru_cache_free_func_t)(struct xfs_mru_cache_elem *elem); > +typedef void (*xfs_mru_cache_free_func_t)(void *, struct xfs_mru_cache_elem *); > > int xfs_mru_cache_init(void); > void xfs_mru_cache_uninit(void); > -int xfs_mru_cache_create(struct xfs_mru_cache **mrup, unsigned int lifetime_ms, > - unsigned int grp_count, > - xfs_mru_cache_free_func_t free_func); > +int xfs_mru_cache_create(struct xfs_mru_cache **mrup, void *data, > + unsigned int lifetime_ms, unsigned int grp_count, > + xfs_mru_cache_free_func_t free_func); > void xfs_mru_cache_destroy(struct xfs_mru_cache *mru); > int xfs_mru_cache_insert(struct xfs_mru_cache *mru, unsigned long key, > struct xfs_mru_cache_elem *elem); > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 945de08af7ba..19f4e30265c2 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -506,8 +506,8 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release); > DEFINE_BUF_ITEM_EVENT(xfs_trans_binval); > > DECLARE_EVENT_CLASS(xfs_filestream_class, > - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), > - TP_ARGS(ip, agno), > + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), > + TP_ARGS(mp, ino, agno), > TP_STRUCT__entry( > __field(dev_t, dev) > __field(xfs_ino_t, ino) > @@ -515,10 +515,10 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, > __field(int, streams) > ), > TP_fast_assign( > - __entry->dev = VFS_I(ip)->i_sb->s_dev; > - __entry->ino = ip->i_ino; > + __entry->dev = mp->m_super->s_dev; > + __entry->ino = ino; > __entry->agno = agno; > - __entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno); > + __entry->streams = xfs_filestream_peek_ag(mp, agno); > ), > TP_printk("dev %d:%d ino 0x%llx agno %u streams %d", > MAJOR(__entry->dev), MINOR(__entry->dev), > @@ -528,8 +528,8 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, > ) > #define DEFINE_FILESTREAM_EVENT(name) \ > DEFINE_EVENT(xfs_filestream_class, name, \ > - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), \ > - TP_ARGS(ip, agno)) > + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), \ > + TP_ARGS(mp, ino, agno)) > DEFINE_FILESTREAM_EVENT(xfs_filestream_free); > DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup); > DEFINE_FILESTREAM_EVENT(xfs_filestream_scan); > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount 2018-04-05 18:13 ` Brian Foster @ 2018-04-06 6:59 ` Christoph Hellwig 2018-04-06 13:47 ` Brian Foster 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2018-04-06 6:59 UTC (permalink / raw) To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs On Thu, Apr 05, 2018 at 02:13:01PM -0400, Brian Foster wrote: > On Thu, Apr 05, 2018 at 10:18:42AM -0700, Christoph Hellwig wrote: > > On Thu, Apr 05, 2018 at 08:11:47AM -0400, Brian Foster wrote: > > > The obvious solution is to grab an inode reference for > > > xfs_fstrm_item. The filestream mechanism only actually uses the > > > inode pointer as a means to access the xfs_mount, however. Rather > > > than add unnecessary complexity, simplify the implementation to > > > store an xfs_mount pointer instead of the inode. This also requires > > > updates to the tracepoint class to provide the associated data via > > > parameters rather than the inode and a minor hack to peek at the MRU > > > key to establish the inode number at free time. > > > > How about not replacing it all and just providing the context from > > the mru cache? Something like the untested patch below: > > > > I actually considered something like this briefly but it initially > looked more involved than what you've come up with here. I think that I > just didn't consider storing the mp directly in the xfs_mru_cache so it > would be available to the reaper context. With that, this looks nicer to > me. I'll give it a whirl.. thanks! FYI, here is a version with a proper changelog: --- >From ef8248a6cf39f2fbf69b2590c4cc66c9a2ba57fd Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Thu, 5 Apr 2018 19:17:51 +0200 Subject: xfs: remove filestream item xfs_inode reference The filestreams allocator stores an xfs_fstrm_item structure in the MRU to cache inode number to agno mappings for a particular length of time. Each xfs_fstrm_item contains the internal MRU structure, an inode pointer and agno value. The inode pointer stored in the xfs_fstrm_item is not referenced, however, which means the inode itself can be removed and reclaimed before the MRU item is freed. If this occurs, xfs_fstrm_free_func() can access freed or unrelated memory through xfs_fstrm_item->ip and crash. The obvious solution is to grab an inode reference for xfs_fstrm_item. The filestream mechanism only actually uses the inode pointer as a means to access the xfs_mount, however. Rather than add unnecessary complexity, simplify the implementation to store an xfs_mount pointer in struct xfs_mru_cache, and pass it to the free callback. This also requires updates to the tracepoint class to provide the associated data via parameters rather than the inode and a minor hack to peek at the MRU key to establish the inode number at free time. Based on debugging work and an earlier patch from Brian Foster, who also wrote most of this changelog. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_filestream.c | 19 +++++++++---------- fs/xfs/xfs_mru_cache.c | 8 +++++--- fs/xfs/xfs_mru_cache.h | 8 ++++---- fs/xfs/xfs_trace.h | 14 +++++++------- 4 files changed, 25 insertions(+), 24 deletions(-) diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c index 043ca3808ea2..5131a6e25fc9 100644 --- a/fs/xfs/xfs_filestream.c +++ b/fs/xfs/xfs_filestream.c @@ -34,7 +34,6 @@ struct xfs_fstrm_item { struct xfs_mru_cache_elem mru; - struct xfs_inode *ip; xfs_agnumber_t ag; /* AG in use for this directory */ }; @@ -122,14 +121,15 @@ xfs_filestream_put_ag( static void xfs_fstrm_free_func( + void *data, struct xfs_mru_cache_elem *mru) { + struct xfs_mount *mp = data; struct xfs_fstrm_item *item = container_of(mru, struct xfs_fstrm_item, mru); - xfs_filestream_put_ag(item->ip->i_mount, item->ag); - - trace_xfs_filestream_free(item->ip, item->ag); + xfs_filestream_put_ag(mp, item->ag); + trace_xfs_filestream_free(mp, mru->key, item->ag); kmem_free(item); } @@ -165,7 +165,7 @@ xfs_filestream_pick_ag( trylock = XFS_ALLOC_FLAG_TRYLOCK; for (nscan = 0; 1; nscan++) { - trace_xfs_filestream_scan(ip, ag); + trace_xfs_filestream_scan(mp, ip->i_ino, ag); pag = xfs_perag_get(mp, ag); @@ -265,7 +265,6 @@ xfs_filestream_pick_ag( goto out_put_ag; item->ag = *agp; - item->ip = ip; err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru); if (err) { @@ -333,7 +332,7 @@ xfs_filestream_lookup_ag( ag = container_of(mru, struct xfs_fstrm_item, mru)->ag; xfs_mru_cache_done(mp->m_filestream); - trace_xfs_filestream_lookup(ip, ag); + trace_xfs_filestream_lookup(mp, ip->i_ino, ag); goto out; } @@ -399,7 +398,7 @@ xfs_filestream_new_ag( * Only free the item here so we skip over the old AG earlier. */ if (mru) - xfs_fstrm_free_func(mru); + xfs_fstrm_free_func(mp, mru); IRELE(pip); exit: @@ -426,8 +425,8 @@ xfs_filestream_mount( * timer tunable to within about 10 percent. This requires at least 10 * groups. */ - return xfs_mru_cache_create(&mp->m_filestream, xfs_fstrm_centisecs * 10, - 10, xfs_fstrm_free_func); + return xfs_mru_cache_create(&mp->m_filestream, mp, + xfs_fstrm_centisecs * 10, 10, xfs_fstrm_free_func); } void diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c index f8a674d7f092..70eea7ae2876 100644 --- a/fs/xfs/xfs_mru_cache.c +++ b/fs/xfs/xfs_mru_cache.c @@ -112,6 +112,7 @@ struct xfs_mru_cache { xfs_mru_cache_free_func_t free_func; /* Function pointer for freeing. */ struct delayed_work work; /* Workqueue data for reaping. */ unsigned int queued; /* work has been queued */ + void *data; }; static struct workqueue_struct *xfs_mru_reap_wq; @@ -259,7 +260,7 @@ _xfs_mru_cache_clear_reap_list( list_for_each_entry_safe(elem, next, &tmp, list_node) { list_del_init(&elem->list_node); - mru->free_func(elem); + mru->free_func(mru->data, elem); } spin_lock(&mru->lock); @@ -326,6 +327,7 @@ xfs_mru_cache_uninit(void) int xfs_mru_cache_create( struct xfs_mru_cache **mrup, + void *data, unsigned int lifetime_ms, unsigned int grp_count, xfs_mru_cache_free_func_t free_func) @@ -369,7 +371,7 @@ xfs_mru_cache_create( mru->grp_time = grp_time; mru->free_func = free_func; - + mru->data = data; *mrup = mru; exit: @@ -492,7 +494,7 @@ xfs_mru_cache_delete( elem = xfs_mru_cache_remove(mru, key); if (elem) - mru->free_func(elem); + mru->free_func(mru->data, elem); } /* diff --git a/fs/xfs/xfs_mru_cache.h b/fs/xfs/xfs_mru_cache.h index fb5245ba5ff7..b3f3fbdfcc47 100644 --- a/fs/xfs/xfs_mru_cache.h +++ b/fs/xfs/xfs_mru_cache.h @@ -26,13 +26,13 @@ struct xfs_mru_cache_elem { }; /* Function pointer type for callback to free a client's data pointer. */ -typedef void (*xfs_mru_cache_free_func_t)(struct xfs_mru_cache_elem *elem); +typedef void (*xfs_mru_cache_free_func_t)(void *, struct xfs_mru_cache_elem *); int xfs_mru_cache_init(void); void xfs_mru_cache_uninit(void); -int xfs_mru_cache_create(struct xfs_mru_cache **mrup, unsigned int lifetime_ms, - unsigned int grp_count, - xfs_mru_cache_free_func_t free_func); +int xfs_mru_cache_create(struct xfs_mru_cache **mrup, void *data, + unsigned int lifetime_ms, unsigned int grp_count, + xfs_mru_cache_free_func_t free_func); void xfs_mru_cache_destroy(struct xfs_mru_cache *mru); int xfs_mru_cache_insert(struct xfs_mru_cache *mru, unsigned long key, struct xfs_mru_cache_elem *elem); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 945de08af7ba..19f4e30265c2 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -506,8 +506,8 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release); DEFINE_BUF_ITEM_EVENT(xfs_trans_binval); DECLARE_EVENT_CLASS(xfs_filestream_class, - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), - TP_ARGS(ip, agno), + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), + TP_ARGS(mp, ino, agno), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_ino_t, ino) @@ -515,10 +515,10 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, __field(int, streams) ), TP_fast_assign( - __entry->dev = VFS_I(ip)->i_sb->s_dev; - __entry->ino = ip->i_ino; + __entry->dev = mp->m_super->s_dev; + __entry->ino = ino; __entry->agno = agno; - __entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno); + __entry->streams = xfs_filestream_peek_ag(mp, agno); ), TP_printk("dev %d:%d ino 0x%llx agno %u streams %d", MAJOR(__entry->dev), MINOR(__entry->dev), @@ -528,8 +528,8 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, ) #define DEFINE_FILESTREAM_EVENT(name) \ DEFINE_EVENT(xfs_filestream_class, name, \ - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), \ - TP_ARGS(ip, agno)) + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), \ + TP_ARGS(mp, ino, agno)) DEFINE_FILESTREAM_EVENT(xfs_filestream_free); DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup); DEFINE_FILESTREAM_EVENT(xfs_filestream_scan); -- 2.16.3 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount 2018-04-06 6:59 ` Christoph Hellwig @ 2018-04-06 13:47 ` Brian Foster 0 siblings, 0 replies; 13+ messages in thread From: Brian Foster @ 2018-04-06 13:47 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On Thu, Apr 05, 2018 at 11:59:44PM -0700, Christoph Hellwig wrote: > On Thu, Apr 05, 2018 at 02:13:01PM -0400, Brian Foster wrote: > > On Thu, Apr 05, 2018 at 10:18:42AM -0700, Christoph Hellwig wrote: > > > On Thu, Apr 05, 2018 at 08:11:47AM -0400, Brian Foster wrote: > > > > The obvious solution is to grab an inode reference for > > > > xfs_fstrm_item. The filestream mechanism only actually uses the > > > > inode pointer as a means to access the xfs_mount, however. Rather > > > > than add unnecessary complexity, simplify the implementation to > > > > store an xfs_mount pointer instead of the inode. This also requires > > > > updates to the tracepoint class to provide the associated data via > > > > parameters rather than the inode and a minor hack to peek at the MRU > > > > key to establish the inode number at free time. > > > > > > How about not replacing it all and just providing the context from > > > the mru cache? Something like the untested patch below: > > > > > > > I actually considered something like this briefly but it initially > > looked more involved than what you've come up with here. I think that I > > just didn't consider storing the mp directly in the xfs_mru_cache so it > > would be available to the reaper context. With that, this looks nicer to > > me. I'll give it a whirl.. thanks! > > FYI, here is a version with a proper changelog: > > --- > From ef8248a6cf39f2fbf69b2590c4cc66c9a2ba57fd Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig <hch@lst.de> > Date: Thu, 5 Apr 2018 19:17:51 +0200 > Subject: xfs: remove filestream item xfs_inode reference > > The filestreams allocator stores an xfs_fstrm_item structure in the MRU to > cache inode number to agno mappings for a particular length of time. Each > xfs_fstrm_item contains the internal MRU structure, an inode pointer and > agno value. > > The inode pointer stored in the xfs_fstrm_item is not referenced, however, > which means the inode itself can be removed and reclaimed before the MRU > item is freed. If this occurs, xfs_fstrm_free_func() can access freed or > unrelated memory through xfs_fstrm_item->ip and crash. > > The obvious solution is to grab an inode reference for xfs_fstrm_item. > The filestream mechanism only actually uses the inode pointer as a means > to access the xfs_mount, however. Rather than add unnecessary > complexity, simplify the implementation to store an xfs_mount pointer in > struct xfs_mru_cache, and pass it to the free callback. This also > requires updates to the tracepoint class to provide the associated data > via parameters rather than the inode and a minor hack to peek at the MRU > key to establish the inode number at free time. > > Based on debugging work and an earlier patch from Brian Foster, who > also wrote most of this changelog. > > Reported-by: Brian Foster <bfoster@redhat.com> > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- Looks good to me and survives my test: Reviewed-by: Brian Foster <bfoster@redhat.com> Care to send this as an independent patch (unless Darrick just wants to pick it up from here)? Brian > fs/xfs/xfs_filestream.c | 19 +++++++++---------- > fs/xfs/xfs_mru_cache.c | 8 +++++--- > fs/xfs/xfs_mru_cache.h | 8 ++++---- > fs/xfs/xfs_trace.h | 14 +++++++------- > 4 files changed, 25 insertions(+), 24 deletions(-) > > diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c > index 043ca3808ea2..5131a6e25fc9 100644 > --- a/fs/xfs/xfs_filestream.c > +++ b/fs/xfs/xfs_filestream.c > @@ -34,7 +34,6 @@ > > struct xfs_fstrm_item { > struct xfs_mru_cache_elem mru; > - struct xfs_inode *ip; > xfs_agnumber_t ag; /* AG in use for this directory */ > }; > > @@ -122,14 +121,15 @@ xfs_filestream_put_ag( > > static void > xfs_fstrm_free_func( > + void *data, > struct xfs_mru_cache_elem *mru) > { > + struct xfs_mount *mp = data; > struct xfs_fstrm_item *item = > container_of(mru, struct xfs_fstrm_item, mru); > > - xfs_filestream_put_ag(item->ip->i_mount, item->ag); > - > - trace_xfs_filestream_free(item->ip, item->ag); > + xfs_filestream_put_ag(mp, item->ag); > + trace_xfs_filestream_free(mp, mru->key, item->ag); > > kmem_free(item); > } > @@ -165,7 +165,7 @@ xfs_filestream_pick_ag( > trylock = XFS_ALLOC_FLAG_TRYLOCK; > > for (nscan = 0; 1; nscan++) { > - trace_xfs_filestream_scan(ip, ag); > + trace_xfs_filestream_scan(mp, ip->i_ino, ag); > > pag = xfs_perag_get(mp, ag); > > @@ -265,7 +265,6 @@ xfs_filestream_pick_ag( > goto out_put_ag; > > item->ag = *agp; > - item->ip = ip; > > err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru); > if (err) { > @@ -333,7 +332,7 @@ xfs_filestream_lookup_ag( > ag = container_of(mru, struct xfs_fstrm_item, mru)->ag; > xfs_mru_cache_done(mp->m_filestream); > > - trace_xfs_filestream_lookup(ip, ag); > + trace_xfs_filestream_lookup(mp, ip->i_ino, ag); > goto out; > } > > @@ -399,7 +398,7 @@ xfs_filestream_new_ag( > * Only free the item here so we skip over the old AG earlier. > */ > if (mru) > - xfs_fstrm_free_func(mru); > + xfs_fstrm_free_func(mp, mru); > > IRELE(pip); > exit: > @@ -426,8 +425,8 @@ xfs_filestream_mount( > * timer tunable to within about 10 percent. This requires at least 10 > * groups. > */ > - return xfs_mru_cache_create(&mp->m_filestream, xfs_fstrm_centisecs * 10, > - 10, xfs_fstrm_free_func); > + return xfs_mru_cache_create(&mp->m_filestream, mp, > + xfs_fstrm_centisecs * 10, 10, xfs_fstrm_free_func); > } > > void > diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c > index f8a674d7f092..70eea7ae2876 100644 > --- a/fs/xfs/xfs_mru_cache.c > +++ b/fs/xfs/xfs_mru_cache.c > @@ -112,6 +112,7 @@ struct xfs_mru_cache { > xfs_mru_cache_free_func_t free_func; /* Function pointer for freeing. */ > struct delayed_work work; /* Workqueue data for reaping. */ > unsigned int queued; /* work has been queued */ > + void *data; > }; > > static struct workqueue_struct *xfs_mru_reap_wq; > @@ -259,7 +260,7 @@ _xfs_mru_cache_clear_reap_list( > > list_for_each_entry_safe(elem, next, &tmp, list_node) { > list_del_init(&elem->list_node); > - mru->free_func(elem); > + mru->free_func(mru->data, elem); > } > > spin_lock(&mru->lock); > @@ -326,6 +327,7 @@ xfs_mru_cache_uninit(void) > int > xfs_mru_cache_create( > struct xfs_mru_cache **mrup, > + void *data, > unsigned int lifetime_ms, > unsigned int grp_count, > xfs_mru_cache_free_func_t free_func) > @@ -369,7 +371,7 @@ xfs_mru_cache_create( > > mru->grp_time = grp_time; > mru->free_func = free_func; > - > + mru->data = data; > *mrup = mru; > > exit: > @@ -492,7 +494,7 @@ xfs_mru_cache_delete( > > elem = xfs_mru_cache_remove(mru, key); > if (elem) > - mru->free_func(elem); > + mru->free_func(mru->data, elem); > } > > /* > diff --git a/fs/xfs/xfs_mru_cache.h b/fs/xfs/xfs_mru_cache.h > index fb5245ba5ff7..b3f3fbdfcc47 100644 > --- a/fs/xfs/xfs_mru_cache.h > +++ b/fs/xfs/xfs_mru_cache.h > @@ -26,13 +26,13 @@ struct xfs_mru_cache_elem { > }; > > /* Function pointer type for callback to free a client's data pointer. */ > -typedef void (*xfs_mru_cache_free_func_t)(struct xfs_mru_cache_elem *elem); > +typedef void (*xfs_mru_cache_free_func_t)(void *, struct xfs_mru_cache_elem *); > > int xfs_mru_cache_init(void); > void xfs_mru_cache_uninit(void); > -int xfs_mru_cache_create(struct xfs_mru_cache **mrup, unsigned int lifetime_ms, > - unsigned int grp_count, > - xfs_mru_cache_free_func_t free_func); > +int xfs_mru_cache_create(struct xfs_mru_cache **mrup, void *data, > + unsigned int lifetime_ms, unsigned int grp_count, > + xfs_mru_cache_free_func_t free_func); > void xfs_mru_cache_destroy(struct xfs_mru_cache *mru); > int xfs_mru_cache_insert(struct xfs_mru_cache *mru, unsigned long key, > struct xfs_mru_cache_elem *elem); > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 945de08af7ba..19f4e30265c2 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -506,8 +506,8 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release); > DEFINE_BUF_ITEM_EVENT(xfs_trans_binval); > > DECLARE_EVENT_CLASS(xfs_filestream_class, > - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), > - TP_ARGS(ip, agno), > + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), > + TP_ARGS(mp, ino, agno), > TP_STRUCT__entry( > __field(dev_t, dev) > __field(xfs_ino_t, ino) > @@ -515,10 +515,10 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, > __field(int, streams) > ), > TP_fast_assign( > - __entry->dev = VFS_I(ip)->i_sb->s_dev; > - __entry->ino = ip->i_ino; > + __entry->dev = mp->m_super->s_dev; > + __entry->ino = ino; > __entry->agno = agno; > - __entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno); > + __entry->streams = xfs_filestream_peek_ag(mp, agno); > ), > TP_printk("dev %d:%d ino 0x%llx agno %u streams %d", > MAJOR(__entry->dev), MINOR(__entry->dev), > @@ -528,8 +528,8 @@ DECLARE_EVENT_CLASS(xfs_filestream_class, > ) > #define DEFINE_FILESTREAM_EVENT(name) \ > DEFINE_EVENT(xfs_filestream_class, name, \ > - TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), \ > - TP_ARGS(ip, agno)) > + TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), \ > + TP_ARGS(mp, ino, agno)) > DEFINE_FILESTREAM_EVENT(xfs_filestream_free); > DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup); > DEFINE_FILESTREAM_EVENT(xfs_filestream_scan); > -- > 2.16.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode 2018-04-05 12:11 [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Brian Foster 2018-04-05 12:11 ` [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount Brian Foster @ 2018-04-05 16:51 ` Darrick J. Wong 2018-04-05 18:12 ` Brian Foster 2018-04-05 21:14 ` Dave Chinner 2 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2018-04-05 16:51 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Thu, Apr 05, 2018 at 08:11:46AM -0400, Brian Foster wrote: > A test case to reproduce a filestream/MRU use-after-free of a > reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be > reset/reused once the inode memory is freed. This normally only > occurs when a new page is cycled into the zone, however. > > Perform the "one-time" inode init immediately prior to freeing > inodes when in DEBUG mode. This will zero the inode, init the low > level structures (locks, lists, etc.) and otherwise ensure each > inode is in a purely uninitialized state while sitting in the zone > as free memory. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > > I'll post a test that depends on this once this is worked out... one > concern this raised is if we consider any future bugs in the inode > initialization code (suppose we initialize some field once that should > be initialized on every allocation, for example), this code has the > potential to suppress such problems in debug mode. So an alternative to > this approach is to perhaps tie this to an errortag and let the > associated xfstests test enable it appropriately. Thoughts or > preferences? How about memset()ing the entire inode with a known poison value in xfs_inode_free_callback and calling _init_once in xfs_inode_alloc instead? That way it'll be obvious that someone touched a poisoned (free) inode. --D > Brian > > fs/xfs/xfs_icache.c | 5 ++++- > fs/xfs/xfs_super.c | 2 +- > fs/xfs/xfs_super.h | 1 + > 3 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index 9a18f69f6e96..86dc4c8a4e1d 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -111,7 +111,10 @@ xfs_inode_free_callback( > xfs_inode_item_destroy(ip); > ip->i_itemp = NULL; > } > - > +#ifdef DEBUG > + /* facilitate catching use-after-free problems */ > + xfs_fs_inode_init_once(ip); > +#endif > kmem_zone_free(xfs_inode_zone, ip); > } > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 612c1d5348b3..29b1be5dfebf 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1030,7 +1030,7 @@ xfs_fs_dirty_inode( > * fields in the xfs inode that left in the initialise state > * when freeing the inode. > */ > -STATIC void > +void > xfs_fs_inode_init_once( > void *inode) > { > diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h > index 8cee8e8050e3..aae8a778f378 100644 > --- a/fs/xfs/xfs_super.h > +++ b/fs/xfs/xfs_super.h > @@ -70,6 +70,7 @@ struct block_device; > > extern void xfs_quiesce_attr(struct xfs_mount *mp); > extern void xfs_flush_inodes(struct xfs_mount *mp); > +extern void xfs_fs_inode_init_once(void *); > extern void xfs_blkdev_issue_flush(struct xfs_buftarg *); > extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *, > xfs_agnumber_t agcount); > -- > 2.13.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode 2018-04-05 16:51 ` [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Darrick J. Wong @ 2018-04-05 18:12 ` Brian Foster 2018-04-06 16:16 ` Darrick J. Wong 0 siblings, 1 reply; 13+ messages in thread From: Brian Foster @ 2018-04-05 18:12 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Thu, Apr 05, 2018 at 09:51:42AM -0700, Darrick J. Wong wrote: > On Thu, Apr 05, 2018 at 08:11:46AM -0400, Brian Foster wrote: > > A test case to reproduce a filestream/MRU use-after-free of a > > reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be > > reset/reused once the inode memory is freed. This normally only > > occurs when a new page is cycled into the zone, however. > > > > Perform the "one-time" inode init immediately prior to freeing > > inodes when in DEBUG mode. This will zero the inode, init the low > > level structures (locks, lists, etc.) and otherwise ensure each > > inode is in a purely uninitialized state while sitting in the zone > > as free memory. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > > > I'll post a test that depends on this once this is worked out... one > > concern this raised is if we consider any future bugs in the inode > > initialization code (suppose we initialize some field once that should > > be initialized on every allocation, for example), this code has the > > potential to suppress such problems in debug mode. So an alternative to > > this approach is to perhaps tie this to an errortag and let the > > associated xfstests test enable it appropriately. Thoughts or > > preferences? > > How about memset()ing the entire inode with a known poison value in > xfs_inode_free_callback and calling _init_once in xfs_inode_alloc > instead? That way it'll be obvious that someone touched a poisoned > (free) inode. > Ok... but note that doesn't address the concern above because we still effectively call _init_once() for every allocation. Essentially this means that if somebody screws up the idempotent nature of the init_once() fields or adds a new xfs_inode field and doesn't initialize it properly, the DEBUG mode kernel could suppress the problem by reinvoking the ctor for each allocation (where a !DEBUG kernel wouldn't). That's not a critical problem, but a bit of an annoying tradeoff since IMO a DEBUG kernel should be more likely to find such problems rather than hide them. But if nobody objects to that tradeoff, I'm fine with doing a memset() -> init_once() cycle as such instead of what this patch is doing. That is probably a more robust form of use-after-free detection after all. The minor tradeoff with the post-alloc init_once() approach is that we'd also potentially suppress failures of the kmem_cache code to call the ctor, but I suppose that's bound to fail spectacularly if that was ever a problem. Brian > --D > > > Brian > > > > fs/xfs/xfs_icache.c | 5 ++++- > > fs/xfs/xfs_super.c | 2 +- > > fs/xfs/xfs_super.h | 1 + > > 3 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > index 9a18f69f6e96..86dc4c8a4e1d 100644 > > --- a/fs/xfs/xfs_icache.c > > +++ b/fs/xfs/xfs_icache.c > > @@ -111,7 +111,10 @@ xfs_inode_free_callback( > > xfs_inode_item_destroy(ip); > > ip->i_itemp = NULL; > > } > > - > > +#ifdef DEBUG > > + /* facilitate catching use-after-free problems */ > > + xfs_fs_inode_init_once(ip); > > +#endif > > kmem_zone_free(xfs_inode_zone, ip); > > } > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 612c1d5348b3..29b1be5dfebf 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1030,7 +1030,7 @@ xfs_fs_dirty_inode( > > * fields in the xfs inode that left in the initialise state > > * when freeing the inode. > > */ > > -STATIC void > > +void > > xfs_fs_inode_init_once( > > void *inode) > > { > > diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h > > index 8cee8e8050e3..aae8a778f378 100644 > > --- a/fs/xfs/xfs_super.h > > +++ b/fs/xfs/xfs_super.h > > @@ -70,6 +70,7 @@ struct block_device; > > > > extern void xfs_quiesce_attr(struct xfs_mount *mp); > > extern void xfs_flush_inodes(struct xfs_mount *mp); > > +extern void xfs_fs_inode_init_once(void *); > > extern void xfs_blkdev_issue_flush(struct xfs_buftarg *); > > extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *, > > xfs_agnumber_t agcount); > > -- > > 2.13.6 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode 2018-04-05 18:12 ` Brian Foster @ 2018-04-06 16:16 ` Darrick J. Wong 0 siblings, 0 replies; 13+ messages in thread From: Darrick J. Wong @ 2018-04-06 16:16 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Thu, Apr 05, 2018 at 02:12:38PM -0400, Brian Foster wrote: > On Thu, Apr 05, 2018 at 09:51:42AM -0700, Darrick J. Wong wrote: > > On Thu, Apr 05, 2018 at 08:11:46AM -0400, Brian Foster wrote: > > > A test case to reproduce a filestream/MRU use-after-free of a > > > reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be > > > reset/reused once the inode memory is freed. This normally only > > > occurs when a new page is cycled into the zone, however. > > > > > > Perform the "one-time" inode init immediately prior to freeing > > > inodes when in DEBUG mode. This will zero the inode, init the low > > > level structures (locks, lists, etc.) and otherwise ensure each > > > inode is in a purely uninitialized state while sitting in the zone > > > as free memory. > > > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > > --- > > > > > > I'll post a test that depends on this once this is worked out... one > > > concern this raised is if we consider any future bugs in the inode > > > initialization code (suppose we initialize some field once that should > > > be initialized on every allocation, for example), this code has the > > > potential to suppress such problems in debug mode. So an alternative to > > > this approach is to perhaps tie this to an errortag and let the > > > associated xfstests test enable it appropriately. Thoughts or > > > preferences? > > > > How about memset()ing the entire inode with a known poison value in > > xfs_inode_free_callback and calling _init_once in xfs_inode_alloc > > instead? That way it'll be obvious that someone touched a poisoned > > (free) inode. > > > > Ok... but note that doesn't address the concern above because we still > effectively call _init_once() for every allocation. Essentially this > means that if somebody screws up the idempotent nature of the > init_once() fields or adds a new xfs_inode field and doesn't initialize > it properly, the DEBUG mode kernel could suppress the problem by > reinvoking the ctor for each allocation (where a !DEBUG kernel > wouldn't). That's not a critical problem, but a bit of an annoying > tradeoff since IMO a DEBUG kernel should be more likely to find such > problems rather than hide them. Or just add a new errortag to turn this behavior on? I think that'd be useful at least for regression testing without requiring everyone to test with KASAN kernels... > But if nobody objects to that tradeoff, I'm fine with doing a memset() > -> init_once() cycle as such instead of what this patch is doing. That > is probably a more robust form of use-after-free detection after all. > The minor tradeoff with the post-alloc init_once() approach is that we'd > also potentially suppress failures of the kmem_cache code to call the > ctor, but I suppose that's bound to fail spectacularly if that was ever > a problem. Probably. :) --D > > Brian > > > --D > > > > > Brian > > > > > > fs/xfs/xfs_icache.c | 5 ++++- > > > fs/xfs/xfs_super.c | 2 +- > > > fs/xfs/xfs_super.h | 1 + > > > 3 files changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > > index 9a18f69f6e96..86dc4c8a4e1d 100644 > > > --- a/fs/xfs/xfs_icache.c > > > +++ b/fs/xfs/xfs_icache.c > > > @@ -111,7 +111,10 @@ xfs_inode_free_callback( > > > xfs_inode_item_destroy(ip); > > > ip->i_itemp = NULL; > > > } > > > - > > > +#ifdef DEBUG > > > + /* facilitate catching use-after-free problems */ > > > + xfs_fs_inode_init_once(ip); > > > +#endif > > > kmem_zone_free(xfs_inode_zone, ip); > > > } > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > index 612c1d5348b3..29b1be5dfebf 100644 > > > --- a/fs/xfs/xfs_super.c > > > +++ b/fs/xfs/xfs_super.c > > > @@ -1030,7 +1030,7 @@ xfs_fs_dirty_inode( > > > * fields in the xfs inode that left in the initialise state > > > * when freeing the inode. > > > */ > > > -STATIC void > > > +void > > > xfs_fs_inode_init_once( > > > void *inode) > > > { > > > diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h > > > index 8cee8e8050e3..aae8a778f378 100644 > > > --- a/fs/xfs/xfs_super.h > > > +++ b/fs/xfs/xfs_super.h > > > @@ -70,6 +70,7 @@ struct block_device; > > > > > > extern void xfs_quiesce_attr(struct xfs_mount *mp); > > > extern void xfs_flush_inodes(struct xfs_mount *mp); > > > +extern void xfs_fs_inode_init_once(void *); > > > extern void xfs_blkdev_issue_flush(struct xfs_buftarg *); > > > extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *, > > > xfs_agnumber_t agcount); > > > -- > > > 2.13.6 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode 2018-04-05 12:11 [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Brian Foster 2018-04-05 12:11 ` [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount Brian Foster 2018-04-05 16:51 ` [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Darrick J. Wong @ 2018-04-05 21:14 ` Dave Chinner 2018-04-06 13:28 ` Brian Foster 2 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2018-04-05 21:14 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Thu, Apr 05, 2018 at 08:11:46AM -0400, Brian Foster wrote: > A test case to reproduce a filestream/MRU use-after-free of a > reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be > reset/reused once the inode memory is freed. This normally only > occurs when a new page is cycled into the zone, however. > > Perform the "one-time" inode init immediately prior to freeing > inodes when in DEBUG mode. This will zero the inode, init the low > level structures (locks, lists, etc.) and otherwise ensure each > inode is in a purely uninitialized state while sitting in the zone > as free memory. Does KASAN catch this use-after-free? i.e. Given that people regularly run fstests with KASAN enabled, do we need to change the code for the test to trigger detection? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode 2018-04-05 21:14 ` Dave Chinner @ 2018-04-06 13:28 ` Brian Foster 2018-04-06 16:13 ` Darrick J. Wong 0 siblings, 1 reply; 13+ messages in thread From: Brian Foster @ 2018-04-06 13:28 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Fri, Apr 06, 2018 at 07:14:37AM +1000, Dave Chinner wrote: > On Thu, Apr 05, 2018 at 08:11:46AM -0400, Brian Foster wrote: > > A test case to reproduce a filestream/MRU use-after-free of a > > reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be > > reset/reused once the inode memory is freed. This normally only > > occurs when a new page is cycled into the zone, however. > > > > Perform the "one-time" inode init immediately prior to freeing > > inodes when in DEBUG mode. This will zero the inode, init the low > > level structures (locks, lists, etc.) and otherwise ensure each > > inode is in a purely uninitialized state while sitting in the zone > > as free memory. > > Does KASAN catch this use-after-free? i.e. Given that people > regularly run fstests with KASAN enabled, do we need to change the > code for the test to trigger detection? > I had tried with page poisoning without much luck, I suspect because my test doesn't involve cycling pages in and out of the kmem cache. I hadn't tried with KASAN though. I just gave it a try with my current test and it looks like it detected it[1]. I guess KASAN is doing some kind of higher level/out-of-band tracking of slab/cache objects that allows this to work. So perhaps we don't need this patch after all... I'll just document in the test that KASAN may be necessary to reproduce the problem. Brian [1] KASAN splat: [ 66.389454] BUG: KASAN: use-after-free in xfs_fstrm_free_func+0x33/0x1b0 [xfs] [ 66.390432] Read of size 8 at addr ffff8800b02dde40 by task rmdir/1939 [ 66.391268] [ 66.391470] CPU: 2 PID: 1939 Comm: rmdir Not tainted 4.16.0-rc5 #111 [ 66.392228] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 66.392925] Call Trace: [ 66.393243] dump_stack+0x85/0xc0 [ 66.393683] print_address_description+0x65/0x270 [ 66.394190] kasan_report+0x253/0x380 [ 66.394681] ? xfs_fstrm_free_func+0x33/0x1b0 [xfs] [ 66.395352] xfs_fstrm_free_func+0x33/0x1b0 [xfs] [ 66.395987] xfs_remove+0x60a/0x670 [xfs] [ 66.396567] ? xfs_iunpin_wait+0x30/0x30 [xfs] [ 66.397077] ? lock_acquire+0xc0/0x220 [ 66.397463] ? d_walk+0x1b3/0x580 [ 66.397803] ? do_raw_spin_unlock+0x8c/0x120 [ 66.398353] xfs_vn_unlink+0xac/0x130 [xfs] [ 66.398915] ? xfs_vn_rename+0x230/0x230 [xfs] [ 66.399369] vfs_rmdir+0x12a/0x1e0 [ 66.399736] do_rmdir+0x258/0x2c0 [ 66.400159] ? user_path_create+0x30/0x30 [ 66.400622] ? up_read+0x17/0x30 [ 66.400941] ? __do_page_fault+0x64a/0x710 [ 66.401385] ? mark_held_locks+0x1b/0xa0 [ 66.401766] ? mark_held_locks+0x1b/0xa0 [ 66.402181] ? do_syscall_64+0x39/0x310 [ 66.402592] ? SyS_mkdir+0x170/0x170 [ 66.402938] do_syscall_64+0xf1/0x310 [ 66.403352] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 66.403920] RIP: 0033:0x7fad1a8f4ec7 [ 66.404327] RSP: 002b:00007fffb10c8ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 [ 66.405173] RAX: ffffffffffffffda RBX: 00007fffb10cab1b RCX: 00007fad1a8f4ec7 [ 66.405864] RDX: 00007fad1abc4000 RSI: 0000000000000001 RDI: 00007fffb10cab1b [ 66.406592] RBP: 00007fffb10c8dd8 R08: 0000000000000000 R09: 0000000000000000 [ 66.407332] R10: 0000564cf39f0010 R11: 0000000000000246 R12: 0000000000000002 [ 66.408090] R13: 00007fad1abc1344 R14: 0000000000000000 R15: 0000000000000000 [ 66.408819] [ 66.408977] Allocated by task 1929: [ 66.409340] kasan_kmalloc+0xa0/0xd0 [ 66.409732] kmem_cache_alloc+0xe8/0x2e0 [ 66.410250] kmem_zone_alloc+0x5e/0xf0 [xfs] [ 66.410808] xfs_inode_alloc+0x2d/0x2b0 [xfs] [ 66.411358] xfs_iget+0x616/0x1740 [xfs] [ 66.411860] xfs_ialloc+0x18b/0xb10 [xfs] [ 66.412423] xfs_dir_ialloc+0x21e/0x3f0 [xfs] [ 66.413032] xfs_create+0x7b4/0xb60 [xfs] [ 66.413608] xfs_generic_create+0x303/0x3f0 [xfs] [ 66.414127] vfs_mkdir+0x1d2/0x2e0 [ 66.414465] SyS_mkdir+0xda/0x170 [ 66.414790] do_syscall_64+0xf1/0x310 [ 66.415141] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 66.415636] [ 66.415790] Freed by task 0: [ 66.416070] __kasan_slab_free+0x136/0x180 [ 66.416467] kmem_cache_free+0xc9/0x340 [ 66.416839] rcu_process_callbacks+0x2ef/0x9c0 [ 66.417262] __do_softirq+0x11a/0x5b9 [ 66.417654] [ 66.417878] The buggy address belongs to the object at ffff8800b02dde40 [ 66.417878] which belongs to the cache xfs_inode of size 1696 [ 66.419219] The buggy address is located 0 bytes inside of [ 66.419219] 1696-byte region [ffff8800b02dde40, ffff8800b02de4e0) [ 66.420437] The buggy address belongs to the page: [ 66.420948] page:ffffea0002c0b600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 66.422024] flags: 0xfffffc0008100(slab|head) [ 66.422466] raw: 000fffffc0008100 0000000000000000 0000000000000000 0000000180110011 [ 66.423223] raw: dead000000000100 dead000000000200 ffff8800c5207480 0000000000000000 [ 66.424013] page dumped because: kasan: bad access detected [ 66.424602] [ 66.424895] Memory state around the buggy address: [ 66.425369] ffff8800b02ddd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 66.426070] ffff8800b02ddd80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc [ 66.426815] >ffff8800b02dde00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 66.427632] ^ [ 66.428213] ffff8800b02dde80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 66.428989] ffff8800b02ddf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 66.429750] ================================================================== ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode 2018-04-06 13:28 ` Brian Foster @ 2018-04-06 16:13 ` Darrick J. Wong 2018-04-06 17:01 ` Brian Foster 0 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2018-04-06 16:13 UTC (permalink / raw) To: Brian Foster; +Cc: Dave Chinner, linux-xfs On Fri, Apr 06, 2018 at 09:28:07AM -0400, Brian Foster wrote: > On Fri, Apr 06, 2018 at 07:14:37AM +1000, Dave Chinner wrote: > > On Thu, Apr 05, 2018 at 08:11:46AM -0400, Brian Foster wrote: > > > A test case to reproduce a filestream/MRU use-after-free of a > > > reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be > > > reset/reused once the inode memory is freed. This normally only > > > occurs when a new page is cycled into the zone, however. > > > > > > Perform the "one-time" inode init immediately prior to freeing > > > inodes when in DEBUG mode. This will zero the inode, init the low > > > level structures (locks, lists, etc.) and otherwise ensure each > > > inode is in a purely uninitialized state while sitting in the zone > > > as free memory. > > > > Does KASAN catch this use-after-free? i.e. Given that people > > regularly run fstests with KASAN enabled, do we need to change the > > code for the test to trigger detection? > > > > I had tried with page poisoning without much luck, I suspect because my > test doesn't involve cycling pages in and out of the kmem cache. I > hadn't tried with KASAN though. I just gave it a try with my current > test and it looks like it detected it[1]. I guess KASAN is doing some > kind of higher level/out-of-band tracking of slab/cache objects that > allows this to work. > > So perhaps we don't need this patch after all... I'll just document in > the test that KASAN may be necessary to reproduce the problem. > > Brian > > [1] KASAN splat: > > [ 66.389454] BUG: KASAN: use-after-free in xfs_fstrm_free_func+0x33/0x1b0 [xfs] > [ 66.390432] Read of size 8 at addr ffff8800b02dde40 by task rmdir/1939 > [ 66.391268] > [ 66.391470] CPU: 2 PID: 1939 Comm: rmdir Not tainted 4.16.0-rc5 #111 > [ 66.392228] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Yay! This'll make for a nice juicy commit log. :) Ok, so at this point this series consists of Christoph's replacement for patch 2, and a revision of this patch to memzero/init_once? Hm, I wonder if there's a way to tell from sysfs if kasan is enabled. --D > [ 66.392925] Call Trace: > [ 66.393243] dump_stack+0x85/0xc0 > [ 66.393683] print_address_description+0x65/0x270 > [ 66.394190] kasan_report+0x253/0x380 > [ 66.394681] ? xfs_fstrm_free_func+0x33/0x1b0 [xfs] > [ 66.395352] xfs_fstrm_free_func+0x33/0x1b0 [xfs] > [ 66.395987] xfs_remove+0x60a/0x670 [xfs] > [ 66.396567] ? xfs_iunpin_wait+0x30/0x30 [xfs] > [ 66.397077] ? lock_acquire+0xc0/0x220 > [ 66.397463] ? d_walk+0x1b3/0x580 > [ 66.397803] ? do_raw_spin_unlock+0x8c/0x120 > [ 66.398353] xfs_vn_unlink+0xac/0x130 [xfs] > [ 66.398915] ? xfs_vn_rename+0x230/0x230 [xfs] > [ 66.399369] vfs_rmdir+0x12a/0x1e0 > [ 66.399736] do_rmdir+0x258/0x2c0 > [ 66.400159] ? user_path_create+0x30/0x30 > [ 66.400622] ? up_read+0x17/0x30 > [ 66.400941] ? __do_page_fault+0x64a/0x710 > [ 66.401385] ? mark_held_locks+0x1b/0xa0 > [ 66.401766] ? mark_held_locks+0x1b/0xa0 > [ 66.402181] ? do_syscall_64+0x39/0x310 > [ 66.402592] ? SyS_mkdir+0x170/0x170 > [ 66.402938] do_syscall_64+0xf1/0x310 > [ 66.403352] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > [ 66.403920] RIP: 0033:0x7fad1a8f4ec7 > [ 66.404327] RSP: 002b:00007fffb10c8ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 > [ 66.405173] RAX: ffffffffffffffda RBX: 00007fffb10cab1b RCX: 00007fad1a8f4ec7 > [ 66.405864] RDX: 00007fad1abc4000 RSI: 0000000000000001 RDI: 00007fffb10cab1b > [ 66.406592] RBP: 00007fffb10c8dd8 R08: 0000000000000000 R09: 0000000000000000 > [ 66.407332] R10: 0000564cf39f0010 R11: 0000000000000246 R12: 0000000000000002 > [ 66.408090] R13: 00007fad1abc1344 R14: 0000000000000000 R15: 0000000000000000 > [ 66.408819] > [ 66.408977] Allocated by task 1929: > [ 66.409340] kasan_kmalloc+0xa0/0xd0 > [ 66.409732] kmem_cache_alloc+0xe8/0x2e0 > [ 66.410250] kmem_zone_alloc+0x5e/0xf0 [xfs] > [ 66.410808] xfs_inode_alloc+0x2d/0x2b0 [xfs] > [ 66.411358] xfs_iget+0x616/0x1740 [xfs] > [ 66.411860] xfs_ialloc+0x18b/0xb10 [xfs] > [ 66.412423] xfs_dir_ialloc+0x21e/0x3f0 [xfs] > [ 66.413032] xfs_create+0x7b4/0xb60 [xfs] > [ 66.413608] xfs_generic_create+0x303/0x3f0 [xfs] > [ 66.414127] vfs_mkdir+0x1d2/0x2e0 > [ 66.414465] SyS_mkdir+0xda/0x170 > [ 66.414790] do_syscall_64+0xf1/0x310 > [ 66.415141] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > [ 66.415636] > [ 66.415790] Freed by task 0: > [ 66.416070] __kasan_slab_free+0x136/0x180 > [ 66.416467] kmem_cache_free+0xc9/0x340 > [ 66.416839] rcu_process_callbacks+0x2ef/0x9c0 > [ 66.417262] __do_softirq+0x11a/0x5b9 > [ 66.417654] > [ 66.417878] The buggy address belongs to the object at ffff8800b02dde40 > [ 66.417878] which belongs to the cache xfs_inode of size 1696 > [ 66.419219] The buggy address is located 0 bytes inside of > [ 66.419219] 1696-byte region [ffff8800b02dde40, ffff8800b02de4e0) > [ 66.420437] The buggy address belongs to the page: > [ 66.420948] page:ffffea0002c0b600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 > [ 66.422024] flags: 0xfffffc0008100(slab|head) > [ 66.422466] raw: 000fffffc0008100 0000000000000000 0000000000000000 0000000180110011 > [ 66.423223] raw: dead000000000100 dead000000000200 ffff8800c5207480 0000000000000000 > [ 66.424013] page dumped because: kasan: bad access detected > [ 66.424602] > [ 66.424895] Memory state around the buggy address: > [ 66.425369] ffff8800b02ddd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 66.426070] ffff8800b02ddd80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc > [ 66.426815] >ffff8800b02dde00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb > [ 66.427632] ^ > [ 66.428213] ffff8800b02dde80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > [ 66.428989] ffff8800b02ddf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > [ 66.429750] ================================================================== > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode 2018-04-06 16:13 ` Darrick J. Wong @ 2018-04-06 17:01 ` Brian Foster 0 siblings, 0 replies; 13+ messages in thread From: Brian Foster @ 2018-04-06 17:01 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs On Fri, Apr 06, 2018 at 09:13:48AM -0700, Darrick J. Wong wrote: > On Fri, Apr 06, 2018 at 09:28:07AM -0400, Brian Foster wrote: > > On Fri, Apr 06, 2018 at 07:14:37AM +1000, Dave Chinner wrote: > > > On Thu, Apr 05, 2018 at 08:11:46AM -0400, Brian Foster wrote: > > > > A test case to reproduce a filestream/MRU use-after-free of a > > > > reclaimed inode requires bits (e.g., ip->i_mount) of the inode to be > > > > reset/reused once the inode memory is freed. This normally only > > > > occurs when a new page is cycled into the zone, however. > > > > > > > > Perform the "one-time" inode init immediately prior to freeing > > > > inodes when in DEBUG mode. This will zero the inode, init the low > > > > level structures (locks, lists, etc.) and otherwise ensure each > > > > inode is in a purely uninitialized state while sitting in the zone > > > > as free memory. > > > > > > Does KASAN catch this use-after-free? i.e. Given that people > > > regularly run fstests with KASAN enabled, do we need to change the > > > code for the test to trigger detection? > > > > > > > I had tried with page poisoning without much luck, I suspect because my > > test doesn't involve cycling pages in and out of the kmem cache. I > > hadn't tried with KASAN though. I just gave it a try with my current > > test and it looks like it detected it[1]. I guess KASAN is doing some > > kind of higher level/out-of-band tracking of slab/cache objects that > > allows this to work. > > > > So perhaps we don't need this patch after all... I'll just document in > > the test that KASAN may be necessary to reproduce the problem. > > > > Brian > > > > [1] KASAN splat: > > > > [ 66.389454] BUG: KASAN: use-after-free in xfs_fstrm_free_func+0x33/0x1b0 [xfs] > > [ 66.390432] Read of size 8 at addr ffff8800b02dde40 by task rmdir/1939 > > [ 66.391268] > > [ 66.391470] CPU: 2 PID: 1939 Comm: rmdir Not tainted 4.16.0-rc5 #111 > > [ 66.392228] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > Yay! This'll make for a nice juicy commit log. :) > > Ok, so at this point this series consists of Christoph's replacement for > patch 2, and a revision of this patch to memzero/init_once? > It's Christoph's variant and just dropping this one atm. KASAN detects the problem with the current test (posted a bit ago), so it probably doesn't make sense to reinvent the wheel. > Hm, I wonder if there's a way to tell from sysfs if kasan is enabled. > Hmm, from fishing around a bit it looks like you could possibly surmise it from /sys/kernel/debug/page_tables/kernel since it shows the KASAN shadow range. I'm not sure if there are other hints or anything more direct though.. Are you hinting at checking for KASAN via the test..? Brian > --D > > > [ 66.392925] Call Trace: > > [ 66.393243] dump_stack+0x85/0xc0 > > [ 66.393683] print_address_description+0x65/0x270 > > [ 66.394190] kasan_report+0x253/0x380 > > [ 66.394681] ? xfs_fstrm_free_func+0x33/0x1b0 [xfs] > > [ 66.395352] xfs_fstrm_free_func+0x33/0x1b0 [xfs] > > [ 66.395987] xfs_remove+0x60a/0x670 [xfs] > > [ 66.396567] ? xfs_iunpin_wait+0x30/0x30 [xfs] > > [ 66.397077] ? lock_acquire+0xc0/0x220 > > [ 66.397463] ? d_walk+0x1b3/0x580 > > [ 66.397803] ? do_raw_spin_unlock+0x8c/0x120 > > [ 66.398353] xfs_vn_unlink+0xac/0x130 [xfs] > > [ 66.398915] ? xfs_vn_rename+0x230/0x230 [xfs] > > [ 66.399369] vfs_rmdir+0x12a/0x1e0 > > [ 66.399736] do_rmdir+0x258/0x2c0 > > [ 66.400159] ? user_path_create+0x30/0x30 > > [ 66.400622] ? up_read+0x17/0x30 > > [ 66.400941] ? __do_page_fault+0x64a/0x710 > > [ 66.401385] ? mark_held_locks+0x1b/0xa0 > > [ 66.401766] ? mark_held_locks+0x1b/0xa0 > > [ 66.402181] ? do_syscall_64+0x39/0x310 > > [ 66.402592] ? SyS_mkdir+0x170/0x170 > > [ 66.402938] do_syscall_64+0xf1/0x310 > > [ 66.403352] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > [ 66.403920] RIP: 0033:0x7fad1a8f4ec7 > > [ 66.404327] RSP: 002b:00007fffb10c8ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 > > [ 66.405173] RAX: ffffffffffffffda RBX: 00007fffb10cab1b RCX: 00007fad1a8f4ec7 > > [ 66.405864] RDX: 00007fad1abc4000 RSI: 0000000000000001 RDI: 00007fffb10cab1b > > [ 66.406592] RBP: 00007fffb10c8dd8 R08: 0000000000000000 R09: 0000000000000000 > > [ 66.407332] R10: 0000564cf39f0010 R11: 0000000000000246 R12: 0000000000000002 > > [ 66.408090] R13: 00007fad1abc1344 R14: 0000000000000000 R15: 0000000000000000 > > [ 66.408819] > > [ 66.408977] Allocated by task 1929: > > [ 66.409340] kasan_kmalloc+0xa0/0xd0 > > [ 66.409732] kmem_cache_alloc+0xe8/0x2e0 > > [ 66.410250] kmem_zone_alloc+0x5e/0xf0 [xfs] > > [ 66.410808] xfs_inode_alloc+0x2d/0x2b0 [xfs] > > [ 66.411358] xfs_iget+0x616/0x1740 [xfs] > > [ 66.411860] xfs_ialloc+0x18b/0xb10 [xfs] > > [ 66.412423] xfs_dir_ialloc+0x21e/0x3f0 [xfs] > > [ 66.413032] xfs_create+0x7b4/0xb60 [xfs] > > [ 66.413608] xfs_generic_create+0x303/0x3f0 [xfs] > > [ 66.414127] vfs_mkdir+0x1d2/0x2e0 > > [ 66.414465] SyS_mkdir+0xda/0x170 > > [ 66.414790] do_syscall_64+0xf1/0x310 > > [ 66.415141] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > [ 66.415636] > > [ 66.415790] Freed by task 0: > > [ 66.416070] __kasan_slab_free+0x136/0x180 > > [ 66.416467] kmem_cache_free+0xc9/0x340 > > [ 66.416839] rcu_process_callbacks+0x2ef/0x9c0 > > [ 66.417262] __do_softirq+0x11a/0x5b9 > > [ 66.417654] > > [ 66.417878] The buggy address belongs to the object at ffff8800b02dde40 > > [ 66.417878] which belongs to the cache xfs_inode of size 1696 > > [ 66.419219] The buggy address is located 0 bytes inside of > > [ 66.419219] 1696-byte region [ffff8800b02dde40, ffff8800b02de4e0) > > [ 66.420437] The buggy address belongs to the page: > > [ 66.420948] page:ffffea0002c0b600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 > > [ 66.422024] flags: 0xfffffc0008100(slab|head) > > [ 66.422466] raw: 000fffffc0008100 0000000000000000 0000000000000000 0000000180110011 > > [ 66.423223] raw: dead000000000100 dead000000000200 ffff8800c5207480 0000000000000000 > > [ 66.424013] page dumped because: kasan: bad access detected > > [ 66.424602] > > [ 66.424895] Memory state around the buggy address: > > [ 66.425369] ffff8800b02ddd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 66.426070] ffff8800b02ddd80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc > > [ 66.426815] >ffff8800b02dde00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb > > [ 66.427632] ^ > > [ 66.428213] ffff8800b02dde80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > [ 66.428989] ffff8800b02ddf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > [ 66.429750] ================================================================== > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-04-06 17:01 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-04-05 12:11 [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Brian Foster 2018-04-05 12:11 ` [PATCH 2/2] xfs: replace filestream item xfs_inode reference with xfs_mount Brian Foster 2018-04-05 17:18 ` Christoph Hellwig 2018-04-05 18:13 ` Brian Foster 2018-04-06 6:59 ` Christoph Hellwig 2018-04-06 13:47 ` Brian Foster 2018-04-05 16:51 ` [PATCH 1/2] xfs: reset xfs_inode struct on reclaim in debug mode Darrick J. Wong 2018-04-05 18:12 ` Brian Foster 2018-04-06 16:16 ` Darrick J. Wong 2018-04-05 21:14 ` Dave Chinner 2018-04-06 13:28 ` Brian Foster 2018-04-06 16:13 ` Darrick J. Wong 2018-04-06 17:01 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).