Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v3 13/31] ufs: Define usercopy region in ufs_inode_cache slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Evgeniy Dushistov, linux-fsdevel,
	netdev, linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

The ufs symlink pathnames, stored in struct ufs_inode_info.i_u1.i_symlink
and therefore contained in the ufs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/ufs/super.c:
        ufs_alloc_inode(...):
            ...
            ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ufs/ufs.h:
        UFS_I(struct inode *inode):
            return container_of(inode, struct ufs_inode_info, vfs_inode);

    fs/ufs/namei.c:
        ufs_symlink(...):
            ...
            inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ufs_inode_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ufs/super.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 6440003f8ddc..62b6a4aad809 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1466,11 +1466,14 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ufs_inode_cachep = kmem_cache_create("ufs_inode_cache",
-					     sizeof(struct ufs_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ufs_inode_cachep = kmem_cache_create_usercopy("ufs_inode_cache",
+				sizeof(struct ufs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ufs_inode_info, i_u1.i_symlink),
+				sizeof_field(struct ufs_inode_info,
+					i_u1.i_symlink),
+				init_once);
 	if (ufs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 12/31] orangefs: Define usercopy region in orangefs_inode_cache slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Mike Marshall, linux-fsdevel, netdev,
	linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.

cache object allocation:
    fs/orangefs/super.c:
        orangefs_alloc_inode(...):
            ...
            orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
            ...
            return &orangefs_inode->vfs_inode;

    fs/orangefs/orangefs-utils.c:
        exofs_symlink(...):
            ...
            inode->i_link = orangefs_inode->link_target;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/orangefs/super.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 47f3fb9cbec4..ee7b8bfa47c2 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -624,11 +624,16 @@ void orangefs_kill_sb(struct super_block *sb)
 
 int orangefs_inode_cache_initialize(void)
 {
-	orangefs_inode_cache = kmem_cache_create("orangefs_inode_cache",
-					      sizeof(struct orangefs_inode_s),
-					      0,
-					      ORANGEFS_CACHE_CREATE_FLAGS,
-					      orangefs_inode_cache_ctor);
+	orangefs_inode_cache = kmem_cache_create_usercopy(
+					"orangefs_inode_cache",
+					sizeof(struct orangefs_inode_s),
+					0,
+					ORANGEFS_CACHE_CREATE_FLAGS,
+					offsetof(struct orangefs_inode_s,
+						link_target),
+					sizeof_field(struct orangefs_inode_s,
+						link_target),
+					orangefs_inode_cache_ctor);
 
 	if (!orangefs_inode_cache) {
 		gossip_err("Cannot create orangefs_inode_cache\n");
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 11/31] exofs: Define usercopy region in exofs_inode_cache slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Boaz Harrosh, linux-fsdevel, netdev,
	linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

The exofs short symlink names, stored in struct exofs_i_info.i_data and
therefore contained in the exofs_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/exofs/super.c:
        exofs_alloc_inode(...):
            ...
            oi = kmem_cache_alloc(exofs_inode_cachep, GFP_KERNEL);
            ...
            return &oi->vfs_inode;

    fs/exofs/namei.c:
        exofs_symlink(...):
            ...
            inode->i_link = (char *)oi->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
exofs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exofs/super.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..e5c532875bb7 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -192,10 +192,13 @@ static void exofs_init_once(void *foo)
  */
 static int init_inodecache(void)
 {
-	exofs_inode_cachep = kmem_cache_create("exofs_inode_cache",
+	exofs_inode_cachep = kmem_cache_create_usercopy("exofs_inode_cache",
 				sizeof(struct exofs_i_info), 0,
 				SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
-				SLAB_ACCOUNT, exofs_init_once);
+				SLAB_ACCOUNT,
+				offsetof(struct exofs_i_info, i_data),
+				sizeof_field(struct exofs_i_info, i_data),
+				exofs_init_once);
 	if (exofs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 10/31] befs: Define usercopy region in befs_inode_cache slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Luis de Bethencourt, Salah Triki,
	linux-fsdevel, netdev, linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/befs/linuxvfs.c:
        befs_alloc_inode(...):
            ...
            bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
            ...
            return &bi->vfs_inode;

        befs_iget(...):
            ...
            strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
                    BEFS_SYMLINK_LEN);
            ...
            inode->i_link = befs_ino->i_data.symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
befs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Luis de Bethencourt <luisbg@kernel.org>
Cc: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis de Bethencourt <luisbg@kernel.org>
---
 fs/befs/linuxvfs.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index a92355cc453b..e5dcd26003dc 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
 static int __init
 befs_init_inodecache(void)
 {
-	befs_inode_cachep = kmem_cache_create("befs_inode_cache",
-					      sizeof (struct befs_inode_info),
-					      0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					      init_once);
+	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
+				sizeof(struct befs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct befs_inode_info,
+					i_data.symlink),
+				sizeof_field(struct befs_inode_info,
+					i_data.symlink),
+				init_once);
 	if (befs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 09/31] jfs: Define usercopy region in jfs_ip slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Kleikamp, David Windsor, Kees Cook, kernel-hardening, netdev,
	jfs-discussion, linux-mm, linux-fsdevel
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

The jfs symlink pathnames, stored in struct jfs_inode_info.i_inline and
therefore contained in the jfs_ip slab cache, need to be copied to/from
userspace.

cache object allocation:
    fs/jfs/super.c:
        jfs_alloc_inode(...):
            ...
            jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
            ...
            return &jfs_inode->vfs_inode;

    fs/jfs/jfs_incore.h:
        JFS_IP(struct inode *inode):
            return container_of(inode, struct jfs_inode_info, vfs_inode);

    fs/jfs/inode.c:
        jfs_iget(...):
            ...
            inode->i_link = JFS_IP(inode)->i_inline;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
jfs_ip slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/jfs/super.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 2f14677169c3..e018412608d4 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -966,9 +966,11 @@ static int __init init_jfs_fs(void)
 	int rc;
 
 	jfs_inode_cachep =
-	    kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
-			    SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
-			    init_once);
+	    kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
+			0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+			offsetof(struct jfs_inode_info, i_inline),
+			sizeof_field(struct jfs_inode_info, i_inline),
+			init_once);
 	if (jfs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* [PATCH v3 08/31] ext2: Define usercopy region in ext2_inode_cache slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Jan Kara, linux-ext4, linux-fsdevel,
	netdev, linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
therefore contained in the ext2_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/ext2/super.c:
        ext2_alloc_inode(...):
            struct ext2_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ext2/ext2.h:
        EXT2_I(struct inode *inode):
            return container_of(inode, struct ext2_inode_info, vfs_inode);

    fs/ext2/namei.c:
        ext2_symlink(...):
            ...
            inode->i_link = (char *)&EXT2_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext2_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1458706bd2ec..789c29987b36 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -220,11 +220,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
-					     sizeof(struct ext2_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
+				sizeof(struct ext2_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext2_inode_info, i_data),
+				sizeof_field(struct ext2_inode_info, i_data),
+				init_once);
 	if (ext2_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 07/31] ext4: Define usercopy region in ext4_inode_cache slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-fsdevel, netdev, linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data
and therefore contained in the ext4_inode_cache slab cache, need
to be copied to/from userspace.

cache object allocation:
    fs/ext4/super.c:
        ext4_alloc_inode(...):
            struct ext4_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    include/trace/events/ext4.h:
            #define EXT4_I(inode) \
                (container_of(inode, struct ext4_inode_info, vfs_inode))

    fs/ext4/namei.c:
        ext4_symlink(...):
            ...
            inode->i_link = (char *)&EXT4_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len)

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext4_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ext4/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b104096fce9e..b5d393321b7b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1036,11 +1036,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
-					     sizeof(struct ext4_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
+				sizeof(struct ext4_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext4_inode_info, i_data),
+				sizeof_field(struct ext4_inode_info, i_data),
+				init_once);
 	if (ext4_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 06/31] vfs: Copy struct mount.mnt_id to userspace using put_user()
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel, netdev,
	linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

The mnt_id field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/fhandle.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 58a61f55e0d0..46e00ccca8f0 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -68,8 +68,7 @@ static long do_sys_name_to_handle(struct path *path,
 	} else
 		retval = 0;
 	/* copy the mount id */
-	if (copy_to_user(mnt_id, &real_mount(path->mnt)->mnt_id,
-			 sizeof(*mnt_id)) ||
+	if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) ||
 	    copy_to_user(ufh, handle,
 			 sizeof(struct file_handle) + handle_bytes))
 		retval = -EFAULT;
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 05/31] vfs: Define usercopy region in names_cache slab caches
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel, netdev,
	linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

VFS pathnames are stored in the names_cache slab cache, either inline
or across an entire allocation entry (when approaching PATH_MAX). These
are copied to/from userspace, so they must be entirely whitelisted.

cache object allocation:
    include/linux/fs.h:
        #define __getname()    kmem_cache_alloc(names_cachep, GFP_KERNEL)

example usage trace:
    strncpy_from_user+0x4d/0x170
    getname_flags+0x6f/0x1f0
    user_path_at_empty+0x23/0x40
    do_mount+0x69/0xda0
    SyS_mount+0x83/0xd0

    fs/namei.c:
        getname_flags(...):
            ...
            result = __getname();
            ...
            kname = (char *)result->iname;
            result->name = kname;
            len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
            ...
            if (unlikely(len == EMBEDDED_NAME_MAX)) {
                const size_t size = offsetof(struct filename, iname[1]);
                kname = (char *)result;

                result = kzalloc(size, GFP_KERNEL);
                ...
                result->name = kname;
                len = strncpy_from_user(kname, filename, PATH_MAX);

In support of usercopy hardening, this patch defines the entire cache
object in the names_cache slab cache as whitelisted, since it may entirely
hold name strings to be copied to/from userspace.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, add usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5f5e7c1fcf4b..34ef9a9169be 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3642,8 +3642,8 @@ void __init vfs_caches_init_early(void)
 
 void __init vfs_caches_init(void)
 {
-	names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+	names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
 
 	dcache_init();
 	inode_init();
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 04/31] dcache: Define usercopy region in dentry_cache slab cache
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel, netdev,
	linux-mm, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

When a dentry name is short enough, it can be stored directly in the
dentry itself (instead in a separate kmalloc allocation). These dentry
short names, stored in struct dentry.d_iname and therefore contained in
the dentry_cache slab cache, need to be coped to userspace.

cache object allocation:
    fs/dcache.c:
        __d_alloc(...):
            ...
            dentry = kmem_cache_alloc(dentry_cache, ...);
            ...
            dentry->d_name.name = dentry->d_iname;

example usage trace:
    filldir+0xb0/0x140
    dcache_readdir+0x82/0x170
    iterate_dir+0x142/0x1b0
    SyS_getdents+0xb5/0x160

    fs/readdir.c:
        (called via ctx.actor by dir_emit)
        filldir(..., const char *name, ...):
            ...
            copy_to_user(..., name, namlen)

    fs/libfs.c:
        dcache_readdir(...):
            ...
            next = next_positive(dentry, p, 1)
            ...
            dir_emit(..., next->d_name.name, ...)

In support of usercopy hardening, this patch defines a region in the
dentry_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust hunks for kmalloc-specific things moved later]
[kees: adjust commit log, provide usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..5f5e7c1fcf4b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3603,8 +3603,9 @@ static void __init dcache_init(void)
 	 * but it is probably not worth it because of the cache nature
 	 * of the dcache.
 	 */
-	dentry_cache = KMEM_CACHE(dentry,
-		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
+	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
+		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+		d_iname);

 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs,
	linux-fsdevel, netdev, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

Mark the kmalloc slab caches as entirely whitelisted. These caches
are frequently used to fulfill kernel allocations that contain data
to be copied to/from userspace. Internal-only uses are also common,
but are scattered in the kernel. For now, mark all the kmalloc caches
as whitelisted.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: merged in moved kmalloc hunks, adjust commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c        |  3 ++-
 mm/slab.h        |  3 ++-
 mm/slab_common.c | 10 ++++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index df268999cf02..9af16f675927 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
 	 */
 	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
 				kmalloc_info[INDEX_NODE].name,
-				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
+				0, kmalloc_size(INDEX_NODE));
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index 044755ff9632..2e0fe357d777 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -97,7 +97,8 @@ struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
-			unsigned long flags);
+			unsigned long flags, size_t useroffset,
+			size_t usersize);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			size_t size, unsigned long flags, size_t useroffset,
 			size_t usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 36408f5f2a34..d4e6442f9bbc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -920,14 +920,15 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t siz
 }
 
 struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
-				unsigned long flags)
+				unsigned long flags, size_t useroffset,
+				size_t usersize)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags, 0, size);
+	create_boot_cache(s, name, size, flags, useroffset, usersize);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
@@ -1081,7 +1082,8 @@ void __init setup_kmalloc_cache_index_table(void)
 static void __init new_kmalloc_cache(int idx, unsigned long flags)
 {
 	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
-					kmalloc_info[idx].size, flags);
+					kmalloc_info[idx].size, flags, 0,
+					kmalloc_info[idx].size);
 }
 
 /*
@@ -1122,7 +1124,7 @@ void __init create_kmalloc_caches(unsigned long flags)
 
 			BUG_ON(!n);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
-				size, SLAB_CACHE_DMA | flags);
+				size, SLAB_CACHE_DMA | flags, 0, 0);
 		}
 	}
 #endif
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 02/31] usercopy: Enforce slab cache usercopy region boundaries
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Laura Abbott,
	Ingo Molnar, Mark Rutland, linux-mm, linux-xfs, linux-fsdevel,
	netdev, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

This patch adds the enforcement component of usercopy cache whitelisting,
and is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

The SLAB and SLUB allocators are modified to deny all copy operations
in which the kernel heap memory being modified falls outside of the cache's
defined usercopy region.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log and comments]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c     | 16 +++++++++++-----
 mm/slub.c     | 18 +++++++++++-------
 mm/usercopy.c | 12 ++++++++++++
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 87b6e5e0cdaf..df268999cf02 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4408,7 +4408,9 @@ module_init(slab_proc_init);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -4428,11 +4430,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 	/* Find offset within object. */
 	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= cachep->object_size && n <= cachep->object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < cachep->useroffset)
+		return cachep->name;
+	if (offset - cachep->useroffset > cachep->usersize)
+		return cachep->name;
+	if (n > cachep->useroffset - offset + cachep->usersize)
+		return cachep->name;
 
-	return cachep->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/slub.c b/mm/slub.c
index fae637726c44..bbf73024be3a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3833,7 +3833,9 @@ EXPORT_SYMBOL(__kmalloc_node);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -3843,11 +3845,9 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 {
 	struct kmem_cache *s;
 	unsigned long offset;
-	size_t object_size;
 
 	/* Find object and usable object size. */
 	s = page->slab_cache;
-	object_size = slab_ksize(s);
 
 	/* Reject impossible pointers. */
 	if (ptr < page_address(page))
@@ -3863,11 +3863,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 		offset -= s->red_left_pad;
 	}
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= object_size && n <= object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < s->useroffset)
+		return s->name;
+	if (offset - s->useroffset > s->usersize)
+		return s->name;
+	if (n > s->useroffset - offset + s->usersize)
+		return s->name;
 
-	return s->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..cbffde670c49 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -58,6 +58,18 @@ static noinline int check_stack_object(const void *obj, unsigned long len)
 	return GOOD_STACK;
 }
 
+/*
+ * If this function is reached, then CONFIG_HARDENED_USERCOPY has found an
+ * unexpected state during a copy_from_user() or copy_to_user() call.
+ * There are several checks being performed on the buffer by the
+ * __check_object_size() function. Normal stack buffer usage should never
+ * trip the checks, and kernel text addressing will always trip the check.
+ * For cache objects, it is checking that only the whitelisted range of
+ * bytes for a given cache is being accessed (via the cache's usersize and
+ * useroffset fields). To adjust a cache whitelist, use the usercopy-aware
+ * kmem_cache_create_usercopy() function to create the cache (and
+ * carefully audit the whitelist range).
+ */
 static void report_usercopy(const void *ptr, unsigned long len,
 			    bool to_user, const char *type)
 {
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 01/31] usercopy: Prepare for usercopy whitelisting
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs,
	linux-fsdevel, netdev, kernel-hardening
In-Reply-To: <1505940337-79069-1-git-send-email-keescook@chromium.org>

From: David Windsor <dave@nullcore.net>

This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: convert BUGs to WARNs and fail closed]
[kees: add attack surface reduction analysis to commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/slab.h     | 27 +++++++++++++++++++++------
 include/linux/slab_def.h |  3 +++
 include/linux/slub_def.h |  3 +++
 include/linux/stddef.h   |  2 ++
 mm/slab.c                |  2 +-
 mm/slab.h                |  5 ++++-
 mm/slab_common.c         | 46 ++++++++++++++++++++++++++++++++++++++--------
 mm/slub.c                | 11 +++++++++--
 8 files changed, 81 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41473df6dfb0..8b6cb384f8b6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -126,9 +126,13 @@ struct mem_cgroup;
 void __init kmem_cache_init(void);
 bool slab_is_available(void);
 
-struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
-			unsigned long,
-			void (*)(void *));
+struct kmem_cache *kmem_cache_create(const char *name, size_t size,
+			size_t align, unsigned long flags,
+			void (*ctor)(void *));
+struct kmem_cache *kmem_cache_create_usercopy(const char *name,
+			size_t size, size_t align, unsigned long flags,
+			size_t useroffset, size_t usersize,
+			void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 
@@ -144,9 +148,20 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
  * f.e. add ____cacheline_aligned_in_smp to the struct declaration
  * then the objects will be properly aligned in SMP configurations.
  */
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
-		sizeof(struct __struct), __alignof__(struct __struct),\
-		(__flags), NULL)
+#define KMEM_CACHE(__struct, __flags)					\
+		kmem_cache_create(#__struct, sizeof(struct __struct),	\
+			__alignof__(struct __struct), (__flags), NULL)
+
+/*
+ * To whitelist a single field for copying to/from usercopy, use this
+ * macro instead for KMEM_CACHE() above.
+ */
+#define KMEM_CACHE_USERCOPY(__struct, __flags, __field)			\
+		kmem_cache_create_usercopy(#__struct,			\
+			sizeof(struct __struct),			\
+			__alignof__(struct __struct), (__flags),	\
+			offsetof(struct __struct, __field),		\
+			sizeof_field(struct __struct, __field), NULL)
 
 /*
  * Common kmalloc functions provided by all allocators
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 4ad2c5a26399..03eef0df8648 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -84,6 +84,9 @@ struct kmem_cache {
 	unsigned int *random_seq;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 0783b622311e..62866a1a767c 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -134,6 +134,9 @@ struct kmem_cache {
 	struct kasan_cache kasan_info;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 9c61c7cda936..f00355086fb2 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -18,6 +18,8 @@ enum {
 #define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
 #endif
 
+#define sizeof_field(structure, field) sizeof((((structure *)0)->field))
+
 /**
  * offsetofend(TYPE, MEMBER)
  *
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48c3ed7..87b6e5e0cdaf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 		offsetof(struct kmem_cache, node) +
 				  nr_node_ids * sizeof(struct kmem_cache_node *),
-				  SLAB_HWCACHE_ALIGN);
+				  SLAB_HWCACHE_ALIGN, 0, 0);
 	list_add(&kmem_cache->list, &slab_caches);
 	slab_state = PARTIAL;
 
diff --git a/mm/slab.h b/mm/slab.h
index 073362816acc..044755ff9632 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -21,6 +21,8 @@ struct kmem_cache {
 	unsigned int size;	/* The aligned/padded/added on size  */
 	unsigned int align;	/* Alignment as calculated */
 	unsigned long flags;	/* Active flags on the slab */
+	size_t useroffset;	/* Usercopy region offset */
+	size_t usersize;	/* Usercopy region size */
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
@@ -97,7 +99,8 @@ extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
 			unsigned long flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
-			size_t size, unsigned long flags);
+			size_t size, unsigned long flags, size_t useroffset,
+			size_t usersize);
 
 int slab_unmergeable(struct kmem_cache *s);
 struct kmem_cache *find_mergeable(size_t size, size_t align,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83be82de..36408f5f2a34 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -272,6 +272,9 @@ int slab_unmergeable(struct kmem_cache *s)
 	if (s->ctor)
 		return 1;
 
+	if (s->usersize)
+		return 1;
+
 	/*
 	 * We may have set a slab to be unmergeable during bootstrap.
 	 */
@@ -357,12 +360,16 @@ unsigned long calculate_alignment(unsigned long flags,
 
 static struct kmem_cache *create_cache(const char *name,
 		size_t object_size, size_t size, size_t align,
-		unsigned long flags, void (*ctor)(void *),
+		unsigned long flags, size_t useroffset,
+		size_t usersize, void (*ctor)(void *),
 		struct mem_cgroup *memcg, struct kmem_cache *root_cache)
 {
 	struct kmem_cache *s;
 	int err;
 
+	if (WARN_ON(useroffset + usersize > object_size))
+		useroffset = usersize = 0;
+
 	err = -ENOMEM;
 	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
 	if (!s)
@@ -373,6 +380,8 @@ static struct kmem_cache *create_cache(const char *name,
 	s->size = size;
 	s->align = align;
 	s->ctor = ctor;
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	err = init_memcg_params(s, memcg, root_cache);
 	if (err)
@@ -397,11 +406,13 @@ static struct kmem_cache *create_cache(const char *name,
 }
 
 /*
- * kmem_cache_create - Create a cache.
+ * kmem_cache_create_usercopy - Create a cache.
  * @name: A string which is used in /proc/slabinfo to identify this cache.
  * @size: The size of objects to be created in this cache.
  * @align: The required alignment for the objects.
  * @flags: SLAB flags
+ * @useroffset: Usercopy region offset
+ * @usersize: Usercopy region size
  * @ctor: A constructor for the objects.
  *
  * Returns a ptr to the cache on success, NULL on failure.
@@ -421,8 +432,9 @@ static struct kmem_cache *create_cache(const char *name,
  * as davem.
  */
 struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
-		  unsigned long flags, void (*ctor)(void *))
+kmem_cache_create_usercopy(const char *name, size_t size, size_t align,
+		  unsigned long flags, size_t useroffset, size_t usersize,
+		  void (*ctor)(void *))
 {
 	struct kmem_cache *s = NULL;
 	const char *cache_name;
@@ -453,7 +465,13 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	 */
 	flags &= CACHE_CREATE_MASK;
 
-	s = __kmem_cache_alias(name, size, align, flags, ctor);
+	/* Fail closed on bad usersize of useroffset values. */
+	if (WARN_ON(!usersize && useroffset) ||
+	    WARN_ON(size < usersize || size - usersize < useroffset))
+		usersize = useroffset = 0;
+
+	if (!usersize)
+		s = __kmem_cache_alias(name, size, align, flags, ctor);
 	if (s)
 		goto out_unlock;
 
@@ -465,7 +483,7 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 
 	s = create_cache(cache_name, size, size,
 			 calculate_alignment(flags, align, size),
-			 flags, ctor, NULL, NULL);
+			 flags, useroffset, usersize, ctor, NULL, NULL);
 	if (IS_ERR(s)) {
 		err = PTR_ERR(s);
 		kfree_const(cache_name);
@@ -491,6 +509,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	}
 	return s;
 }
+EXPORT_SYMBOL(kmem_cache_create_usercopy);
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+		unsigned long flags, void (*ctor)(void *))
+{
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+					  ctor);
+}
 EXPORT_SYMBOL(kmem_cache_create);
 
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
@@ -603,6 +630,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	s = create_cache(cache_name, root_cache->object_size,
 			 root_cache->size, root_cache->align,
 			 root_cache->flags & CACHE_CREATE_MASK,
+			 root_cache->useroffset, root_cache->usersize,
 			 root_cache->ctor, memcg, root_cache);
 	/*
 	 * If we could not create a memcg cache, do not complain, because
@@ -870,13 +898,15 @@ bool slab_is_available(void)
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
-		unsigned long flags)
+		unsigned long flags, size_t useroffset, size_t usersize)
 {
 	int err;
 
 	s->name = name;
 	s->size = s->object_size = size;
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	slab_init_memcg_params(s);
 
@@ -897,7 +927,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags);
+	create_boot_cache(s, name, size, flags, 0, size);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
diff --git a/mm/slub.c b/mm/slub.c
index 163352c537ab..fae637726c44 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4201,7 +4201,7 @@ void __init kmem_cache_init(void)
 	kmem_cache = &boot_kmem_cache;
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
-		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN);
+		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
 
 	register_hotmemory_notifier(&slab_memory_callback_nb);
 
@@ -4211,7 +4211,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 			offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *),
-		       SLAB_HWCACHE_ALIGN);
+		       SLAB_HWCACHE_ALIGN, 0, 0);
 
 	kmem_cache = bootstrap(&boot_kmem_cache);
 
@@ -5081,6 +5081,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+static ssize_t usersize_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%zu\n", s->usersize);
+}
+SLAB_ATTR_RO(usersize);
+
 static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
@@ -5455,6 +5461,7 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
+	&usersize_attr.attr,
 
 	NULL
 };
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v3 00/31] Hardened usercopy whitelisting
From: Kees Cook @ 2017-09-20 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, linux-fsdevel, netdev, linux-mm, kernel-hardening,
	David Windsor

v3:
- added LKDTM update patch
- downgrade BUGs to WARNs and fail closed
- add Acks/Reviews from v2

v2:
- added tracing of allocation and usage
- refactored solutions for task_struct
- split up network patches for readability

I intend for this to land via my usercopy hardening tree, so Acks,
Reviewed, and Tested-bys would be greatly appreciated. I have some
questions in a few patches (e.g. CIFS and thread_stack) that would be nice
to get answered for completeness. FWIW, this series has survived generally
for weeks in 0-day testing, and specifically over a couple days rebased
on v4.14-rc1, so I intend to put this in -next shortly unless there is
further feedback.

----

This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY code
in the last public patch of grsecurity/PaX based on our understanding
of the code. Changes or omissions from the original code are ours and
don't reflect the original grsecurity/PaX code.

David Windsor did the bulk of the porting, refactoring, splitting,
testing, etc; I did some extra tweaks, hunk moving, traces, and extra
patches.

Description from patch 1:


Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

------
The patches are broken in several stages of changes:

Prepare and whitelist kmalloc:
    [PATCH 01/31] usercopy: Prepare for usercopy whitelisting
    [PATCH 02/31] usercopy: Enforce slab cache usercopy region boundaries
    [PATCH 03/31] usercopy: Mark kmalloc caches as usercopy caches

Update VFS layer for symlinks and other inline storage:
    [PATCH 04/31] dcache: Define usercopy region in dentry_cache slab
    [PATCH 05/31] vfs: Define usercopy region in names_cache slab caches
    [PATCH 06/31] vfs: Copy struct mount.mnt_id to userspace using
    [PATCH 07/31] ext4: Define usercopy region in ext4_inode_cache slab
    [PATCH 08/31] ext2: Define usercopy region in ext2_inode_cache slab
    [PATCH 09/31] jfs: Define usercopy region in jfs_ip slab cache
    [PATCH 10/31] befs: Define usercopy region in befs_inode_cache slab
    [PATCH 11/31] exofs: Define usercopy region in exofs_inode_cache slab
    [PATCH 12/31] orangefs: Define usercopy region in
    [PATCH 13/31] ufs: Define usercopy region in ufs_inode_cache slab
    [PATCH 14/31] vxfs: Define usercopy region in vxfs_inode slab cache
    [PATCH 15/31] xfs: Define usercopy region in xfs_inode slab cache
    [PATCH 16/31] cifs: Define usercopy region in cifs_request slab cache

Update scsi layer for inline storage:
    [PATCH 17/31] scsi: Define usercopy region in scsi_sense_cache slab

Whitelist a few network protocol-specific areas of memory:
    [PATCH 18/31] net: Define usercopy region in struct proto slab cache
    [PATCH 19/31] ip: Define usercopy region in IP proto slab cache
    [PATCH 20/31] caif: Define usercopy region in caif proto slab cache
    [PATCH 21/31] sctp: Define usercopy region in SCTP proto slab cache
    [PATCH 22/31] sctp: Copy struct sctp_sock.autoclose to userspace
    [PATCH 23/31] net: Restrict unwhitelisted proto caches to size 0

Whitelist areas of process memory:
    [PATCH 24/31] fork: Define usercopy region in mm_struct slab caches
    [PATCH 25/31] fork: Define usercopy region in thread_stack slab

Deal with per-architecture thread_struct whitelisting:
    [PATCH 26/31] fork: Provide usercopy whitelisting for task_struct
    [PATCH 27/31] x86: Implement thread_struct whitelist for hardened
    [PATCH 28/31] arm64: Implement thread_struct whitelist for hardened
    [PATCH 29/31] arm: Implement thread_struct whitelist for hardened

Make blacklisting the default:
    [PATCH 30/31] usercopy: Restrict non-usercopy caches to size 0

Update LKDTM:
    [PATCH 31/31] lkdtm: Update usercopy tests for whitelisting


Thanks!

-Kees (and David)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next 08/14] gtp: Support encpasulating over IPv6
From: Tom Herbert @ 2017-09-20 20:40 UTC (permalink / raw)
  To: David Miller
  Cc: Tom Herbert, Linux Kernel Network Developers, Pablo Neira Ayuso,
	Harald Welte, Rohit Seth
In-Reply-To: <20170920.124511.922311380432026759.davem@davemloft.net>

On Wed, Sep 20, 2017 at 12:45 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Wed, 20 Sep 2017 11:03:52 -0700
>
>> On Mon, Sep 18, 2017 at 9:19 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Tom Herbert <tom@quantonium.net>
>>> Date: Mon, 18 Sep 2017 17:38:58 -0700
>>>
>>>> Allow peers to be specified by IPv6 addresses.
>>>>
>>>> Signed-off-by: Tom Herbert <tom@quantonium.net>
>>>
>>> Hmmm, can you just check the socket family or something like that?
>>
>> I'm not sure what code you're referring to.
>
> There is a socket associated with the tunnel to do the encapsulation
> and it has an address family, right?

If fd's are set from userspace for the sockets then we could derive
the address family from them. I'll change that. Although, looking at
now I am wondering why were passing fds into GTP instead of just
having the kernel create the UDP port like is done for other encaps.

Tom

^ permalink raw reply

* [PATCH] [RESEND][for 4.14] net: qcom/emac: add software control for pause frame mode
From: Timur Tabi @ 2017-09-20 20:32 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: timur

The EMAC has the option of sending only a single pause frame when
flow control is enabled and the RX queue is full.  Although sending
only one pause frame has little value, this would allow admins to
enable automatic flow control without having to worry about the EMAC
flooding nearby switches with pause frames if the kernel hangs.

The option is enabled by using the single-pause-mode private flag.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 30 +++++++++++++++++++++++
 drivers/net/ethernet/qualcomm/emac/emac-mac.c     | 22 +++++++++++++++++
 drivers/net/ethernet/qualcomm/emac/emac.c         |  3 +++
 drivers/net/ethernet/qualcomm/emac/emac.h         |  3 +++
 4 files changed, 58 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
index bbe24639aa5a..c8c6231b87f3 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
@@ -88,6 +88,8 @@ static void emac_set_msglevel(struct net_device *netdev, u32 data)
 static int emac_get_sset_count(struct net_device *netdev, int sset)
 {
 	switch (sset) {
+	case ETH_SS_PRIV_FLAGS:
+		return 1;
 	case ETH_SS_STATS:
 		return EMAC_STATS_LEN;
 	default:
@@ -100,6 +102,10 @@ static void emac_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 	unsigned int i;
 
 	switch (stringset) {
+	case ETH_SS_PRIV_FLAGS:
+		strcpy(data, "single-pause-mode");
+		break;
+
 	case ETH_SS_STATS:
 		for (i = 0; i < EMAC_STATS_LEN; i++) {
 			strlcpy(data, emac_ethtool_stat_strings[i],
@@ -230,6 +236,27 @@ static int emac_get_regs_len(struct net_device *netdev)
 	return EMAC_MAX_REG_SIZE * sizeof(u32);
 }
 
+#define EMAC_PRIV_ENABLE_SINGLE_PAUSE	BIT(0)
+
+static int emac_set_priv_flags(struct net_device *netdev, u32 flags)
+{
+	struct emac_adapter *adpt = netdev_priv(netdev);
+
+	adpt->single_pause_mode = !!(flags & EMAC_PRIV_ENABLE_SINGLE_PAUSE);
+
+	if (netif_running(netdev))
+		return emac_reinit_locked(adpt);
+
+	return 0;
+}
+
+static u32 emac_get_priv_flags(struct net_device *netdev)
+{
+	struct emac_adapter *adpt = netdev_priv(netdev);
+
+	return adpt->single_pause_mode ? EMAC_PRIV_ENABLE_SINGLE_PAUSE : 0;
+}
+
 static const struct ethtool_ops emac_ethtool_ops = {
 	.get_link_ksettings = phy_ethtool_get_link_ksettings,
 	.set_link_ksettings = phy_ethtool_set_link_ksettings,
@@ -253,6 +280,9 @@ static int emac_get_regs_len(struct net_device *netdev)
 
 	.get_regs_len    = emac_get_regs_len,
 	.get_regs        = emac_get_regs,
+
+	.set_priv_flags = emac_set_priv_flags,
+	.get_priv_flags = emac_get_priv_flags,
 };
 
 void emac_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index bcd4708b3745..0ea3ca09c689 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -551,6 +551,28 @@ static void emac_mac_start(struct emac_adapter *adpt)
 	mac &= ~(HUGEN | VLAN_STRIP | TPAUSE | SIMR | HUGE | MULTI_ALL |
 		 DEBUG_MODE | SINGLE_PAUSE_MODE);
 
+	/* Enable single-pause-frame mode if requested.
+	 *
+	 * If enabled, the EMAC will send a single pause frame when the RX
+	 * queue is full.  This normally leads to packet loss because
+	 * the pause frame disables the remote MAC only for 33ms (the quanta),
+	 * and then the remote MAC continues sending packets even though
+	 * the RX queue is still full.
+	 *
+	 * If disabled, the EMAC sends a pause frame every 31ms until the RX
+	 * queue is no longer full.  Normally, this is the preferred
+	 * method of operation.  However, when the system is hung (e.g.
+	 * cores are halted), the EMAC interrupt handler is never called
+	 * and so the RX queue fills up quickly and stays full.  The resuling
+	 * non-stop "flood" of pause frames sometimes has the effect of
+	 * disabling nearby switches.  In some cases, other nearby switches
+	 * are also affected, shutting down the entire network.
+	 *
+	 * The user can enable or disable single-pause-frame mode
+	 * via ethtool.
+	 */
+	mac |= adpt->single_pause_mode ? SINGLE_PAUSE_MODE : 0;
+
 	writel_relaxed(csr1, adpt->csr + EMAC_EMAC_WRAPPER_CSR1);
 
 	writel_relaxed(mac, adpt->base + EMAC_MAC_CTRL);
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index 60850bfa3d32..759543512117 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -443,6 +443,9 @@ static void emac_init_adapter(struct emac_adapter *adpt)
 
 	/* default to automatic flow control */
 	adpt->automatic = true;
+
+	/* Disable single-pause-frame mode by default */
+	adpt->single_pause_mode = false;
 }
 
 /* Get the clock */
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.h b/drivers/net/ethernet/qualcomm/emac/emac.h
index 8ee4ec6aef2e..d7c9f44209d4 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.h
+++ b/drivers/net/ethernet/qualcomm/emac/emac.h
@@ -363,6 +363,9 @@ struct emac_adapter {
 	bool				tx_flow_control;
 	bool				rx_flow_control;
 
+	/* True == use single-pause-frame mode. */
+	bool				single_pause_mode;
+
 	/* Ring parameter */
 	u8				tpd_burst;
 	u8				rfd_burst;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH iproute2 v2] ip: ipaddress: fix missing space after prefixlen
From: Julien Fortin @ 2017-09-20 20:26 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, sd, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ipaddress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 97971450..fb496bbb 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1604,7 +1604,7 @@ int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n,
 					   format_host_rta(ifa->ifa_family,
 							   rta_tb[IFA_ADDRESS]));
 		}
-		print_int(PRINT_ANY, "prefixlen", "/%d", ifa->ifa_prefixlen);
+		print_int(PRINT_ANY, "prefixlen", "/%d ", ifa->ifa_prefixlen);
 	}
 
 	if (brief)
-- 
2.14.1

^ permalink raw reply related

* Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates
From: Franklin S Cooper Jr @ 2017-09-20 20:19 UTC (permalink / raw)
  To: Yang, Wenyou, Sekhar Nori, wg, mkl, mario.huettel, socketcan,
	quentin.schulz, edumazet, linux-can, netdev, linux-kernel
  Cc: Wenyou Yang, Dong Aisheng
In-Reply-To: <532e09ae-775d-e8aa-e468-78e148c650da@Microchip.com>

Hi Wenyou,

On 09/17/2017 10:47 PM, Yang, Wenyou wrote:
> 
> 
> On 2017/9/14 13:06, Sekhar Nori wrote:
>> On Thursday 14 September 2017 03:28 AM, Franklin S Cooper Jr wrote:
>>>
>>> On 08/18/2017 02:39 PM, Franklin S Cooper Jr wrote:
>>>> During test transmitting using CAN-FD at high bitrates (4 Mbps) only
>>>> resulted in errors. Scoping the signals I noticed that only a single
>>>> bit
>>>> was being transmitted and with a bit more investigation realized the
>>>> actual
>>>> MCAN IP would go back to initialization mode automatically.
>>>>
>>>> It appears this issue is due to the MCAN needing to use the Transmitter
>>>> Delay Compensation Mode as defined in the MCAN User's Guide. When this
>>>> mode is used the User's Guide indicates that the Transmitter Delay
>>>> Compensation Offset register should be set. The document mentions
>>>> that this
>>>> register should be set to (1/dbitrate)/2*(Func Clk Freq).
>>>>
>>>> Additional CAN-CIA's "Bit Time Requirements for CAN FD" document
>>>> indicates
>>>> that this TDC mode is only needed for data bit rates above 2.5 Mbps.
>>>> Therefore, only enable this mode and only set TDCO when the data bit
>>>> rate
>>>> is above 2.5 Mbps.
>>>>
>>>> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
>>>> ---
>>>> I'm pretty surprised that this hasn't been implemented already since
>>>> the primary purpose of CAN-FD is to go beyond 1 Mbps and the MCAN IP
>>>> supports up to 10 Mbps.
>>>>
>>>> So it will be nice to get comments from users of this driver to
>>>> understand
>>>> if they have been able to use CAN-FD beyond 2.5 Mbps without this
>>>> patch.
>>>> If they haven't what did they do to get around it if they needed higher
>>>> speeds.
>>>>
>>>> Meanwhile I plan on testing this using a more "realistic" CAN bus to
>>>> insure
>>>> everything still works at 5 Mbps which is the max speed of my CAN
>>>> transceiver.
>>> ping. Anyone has any thoughts on this?
>> I added Dong who authored the m_can driver and Wenyou who added the only
>> in-kernel user of the driver for any help.
> I tested it on SAMA5D2 Xplained board both with and without this patch, 
> both work with the 4M bps data bit rate.

Thank you for testing this out. Its interesting that you have been able
to use higher speeds without this patch. What is the CAN transceiver
being used on the SAMA5D2 Xplained board? I tried looking at the
schematic but it seems the CAN signals are used on an extension board
which I can't find the schematic for. Also do you mind sharing your test
setup? Were you doing a short point to point test?

Thank You,
Franklin

> 
>>
>> Thanks,
>> Sekhar
>>
>>>>   drivers/net/can/m_can/m_can.c | 24 +++++++++++++++++++++++-
>>>>   1 file changed, 23 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/can/m_can/m_can.c
>>>> b/drivers/net/can/m_can/m_can.c
>>>> index f4947a7..720e073 100644
>>>> --- a/drivers/net/can/m_can/m_can.c
>>>> +++ b/drivers/net/can/m_can/m_can.c
>>>> @@ -126,6 +126,12 @@ enum m_can_mram_cfg {
>>>>   #define DBTP_DSJW_SHIFT        0
>>>>   #define DBTP_DSJW_MASK        (0xf << DBTP_DSJW_SHIFT)
>>>>   +/* Transmitter Delay Compensation Register (TDCR) */
>>>> +#define TDCR_TDCO_SHIFT        8
>>>> +#define TDCR_TDCO_MASK        (0x7F << TDCR_TDCO_SHIFT)
>>>> +#define TDCR_TDCF_SHIFT        0
>>>> +#define TDCR_TDCF_MASK        (0x7F << TDCR_TDCO_SHIFT)
>>>> +
>>>>   /* Test Register (TEST) */
>>>>   #define TEST_LBCK        BIT(4)
>>>>   @@ -977,6 +983,8 @@ static int m_can_set_bittiming(struct
>>>> net_device *dev)
>>>>       const struct can_bittiming *dbt = &priv->can.data_bittiming;
>>>>       u16 brp, sjw, tseg1, tseg2;
>>>>       u32 reg_btp;
>>>> +    u32 enable_tdc = 0;
>>>> +    u32 tdco;
>>>>         brp = bt->brp - 1;
>>>>       sjw = bt->sjw - 1;
>>>> @@ -991,9 +999,23 @@ static int m_can_set_bittiming(struct
>>>> net_device *dev)
>>>>           sjw = dbt->sjw - 1;
>>>>           tseg1 = dbt->prop_seg + dbt->phase_seg1 - 1;
>>>>           tseg2 = dbt->phase_seg2 - 1;
>>>> +
>>>> +        /* TDC is only needed for bitrates beyond 2.5 MBit/s
>>>> +         * Specified in the "Bit Time Requirements for CAN FD"
>>>> document
>>>> +         */
>>>> +        if (dbt->bitrate > 2500000) {
>>>> +            enable_tdc = DBTP_TDC;
>>>> +            /* Equation based on Bosch's M_CAN User Manual's
>>>> +             * Transmitter Delay Compensation Section
>>>> +             */
>>>> +            tdco = priv->can.clock.freq / (dbt->bitrate * 2);
>>>> +            m_can_write(priv, M_CAN_TDCR, tdco << TDCR_TDCO_SHIFT);
>>>> +        }
>>>> +
>>>>           reg_btp = (brp << DBTP_DBRP_SHIFT) | (sjw <<
>>>> DBTP_DSJW_SHIFT) |
>>>>               (tseg1 << DBTP_DTSEG1_SHIFT) |
>>>> -            (tseg2 << DBTP_DTSEG2_SHIFT);
>>>> +            (tseg2 << DBTP_DTSEG2_SHIFT) | enable_tdc;
>>>> +
>>>>           m_can_write(priv, M_CAN_DBTP, reg_btp);
>>>>       }
>>>>  
> 
> Regards,
> Wenyou Yang
> 

^ permalink raw reply

* Re: [iproute2 json PATCH] ip: ipaddress: fix missing space after prefixlen
From: Julien Fortin @ 2017-09-20 20:13 UTC (permalink / raw)
  To: netdev; +Cc: Roopa Prabhu, Nikolay Aleksandrov, David Ahern, sd, Julien Fortin
In-Reply-To: <20170920200421.94514-1-julien@cumulusnetworks.com>

Sorry Sabrina, looks like i forgot the reported-by tag, i'll send a v2...

On Wed, Sep 20, 2017 at 1:04 PM, Julien Fortin
<julien@cumulusnetworks.com> wrote:
> From: Julien Fortin <julien@cumulusnetworks.com>
>
> Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")
>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
> ---
>  ip/ipaddress.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 97971450..fb496bbb 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -1604,7 +1604,7 @@ int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n,
>                                            format_host_rta(ifa->ifa_family,
>                                                            rta_tb[IFA_ADDRESS]));
>                 }
> -               print_int(PRINT_ANY, "prefixlen", "/%d", ifa->ifa_prefixlen);
> +               print_int(PRINT_ANY, "prefixlen", "/%d ", ifa->ifa_prefixlen);
>         }
>
>         if (brief)
> --
> 2.14.1
>

^ permalink raw reply

* Re: cross namespace interface notification for tun devices
From: Jason A. Donenfeld @ 2017-09-20 20:13 UTC (permalink / raw)
  To: Cong Wang; +Cc: Netdev, Mathias
In-Reply-To: <CAM_iQpXhjuh5NJJiAwBJG4EpkWXWz74WS7RFjt5L1mDV9nKTVw@mail.gmail.com>

On Wed, Sep 20, 2017 at 8:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Sounds like we should set NETIF_F_NETNS_LOCAL for tun
> device.

Absolutely do not do this under any circumstances. This would be a
regression and would break API compatibility. As I wrote in my first
email, it's already possible to sleep-loop for that information using
the tun device's fd; I'm just looking for a better event-based
approach.

> What is your legitimate use case of send/receive packet to/from
> a tun device in a different netns?

Because sometimes it's very nice to be able to move network interfaces
that use tun devices into different namespaces, for some xnamespace
proxying.

What Dan described in the email he just sent is exactly this use case.

In WireGuard (a kernel thing), I have facilities for this --
https://www.wireguard.com/netns/ . Now I'm working on the userspace
version and would like to expose the same utility.

Anyway, the purpose of me sending this message to the list was not to
question the "legitimacy" of my application usage, but rather to
elicit feedback on two specific things:

1. to determine if there's already a mechanism in place for this that
I've overlooked; and
2. to determine particularities of me implementing a mechanism, if
it's not already there.

I'm slightly more convinced that there isn't currently a mechanism for
this. It seems like the easiest way, therefore, would be some kind of
control message that could be poll'd for, using the existing
per-process fd. That way there wouldn't be any violations of the
current namespace situation, yet processes could still get event
notifications as needed.

^ permalink raw reply

* Re: [PATCH RFC V1 net-next 0/6] Time based packet transmission
From: Richard Cochran @ 2017-09-20 20:11 UTC (permalink / raw)
  To: levipearson
  Cc: rcochran, netdev, linux-kernel, intel-wired-lan, vinicius.gomes,
	andre.guedes, john.stultz, jesus.sanchez-palencia, henrik, tglx,
	anna-maria, davem
In-Reply-To: <20170920173533.32537-1-levipearson@gmail.com>

On Wed, Sep 20, 2017 at 11:35:33AM -0600, levipearson@gmail.com wrote:
> Anyway, I am wholly in favor of this proposal--in fact, it is very similar to
> a patch set I shared with Eric Mann and others at Intel in early Dec 2016 with
> the intention to get some early feedback before submitting here. I never heard
> back and got busy with other things. I only mention this since you said
> elsewhere that you got this idea from Eric Mann yourself, and I am curious
> whether Eric and I came up with it independently (which I would not be
> surprised at).

Well, I actually thought of placing the Tx time in a CMSG all by
myself, but later I found Eric's talk from 2012,

  https://linuxplumbers.ubicast.tv/videos/linux-network-enabling-requirements-for-audiovideo-bridging-avb/

and so I wanted to give him credit.

Thanks,
Richard

^ permalink raw reply

* [iproute2 json PATCH] ip: ipaddress: fix missing space after prefixlen
From: Julien Fortin @ 2017-09-20 20:04 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, sd, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ipaddress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 97971450..fb496bbb 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1604,7 +1604,7 @@ int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n,
 					   format_host_rta(ifa->ifa_family,
 							   rta_tb[IFA_ADDRESS]));
 		}
-		print_int(PRINT_ANY, "prefixlen", "/%d", ifa->ifa_prefixlen);
+		print_int(PRINT_ANY, "prefixlen", "/%d ", ifa->ifa_prefixlen);
 	}
 
 	if (brief)
-- 
2.14.1

^ permalink raw reply related

* Re: cross namespace interface notification for tun devices
From: Dan Williams @ 2017-09-20 19:57 UTC (permalink / raw)
  To: Cong Wang, Jason A. Donenfeld; +Cc: Netdev, Mathias
In-Reply-To: <CAM_iQpXhjuh5NJJiAwBJG4EpkWXWz74WS7RFjt5L1mDV9nKTVw@mail.gmail.com>

On Wed, 2017-09-20 at 11:29 -0700, Cong Wang wrote:
> On Tue, Sep 19, 2017 at 2:02 PM, Jason A. Donenfeld <Jason@zx2c4.com>
> wrote:
> > On Tue, Sep 19, 2017 at 10:40 PM, Cong Wang <xiyou.wangcong@gmail.c
> > om> wrote:
> > > By "notification" I assume you mean netlink notification.
> > 
> > Yes, netlink notification.
> > 
> > > The question is why does the process in A still care about
> > > the device sitting in B?
> > > 
> > > Also, the process should be able to receive a last notification
> > > on IFF_UP|IFF_RUNNING before device is finally moved to B.
> > > After this point, it should not have any relation to netns A
> > > any more, like the device were completely gone.
> > 
> > That's very clearly not the case with a tun device. Tun devices
> > work
> > by letting a userspace process control the inputs (ndo_start_xmit)
> > and
> > outputs (netif_rx) of the actual network device. This controlling
> > userspace process needs to know when its own interface that it
> > controls goes up and down. In the kernel, we can do this by just
> > checking dev->flags&IFF_UP, and receive notifications on ndo_open
> > and
> > ndo_stop. In userspace, the controlling process looses the ability
> > to
> > receive notifications like ndo_open/ndo_stop when the interface is
> > moved to a new namespace. After the interface is moved to a
> > namespace,
> > the process will still control inputs and ouputs (ndo_start_xmit
> > and
> > netif_rx), but it will no longer receive netlink notifications for
> > the
> > equivalent of ndo_open and ndo_stop. This is problematic.
> 
> Sounds like we should set NETIF_F_NETNS_LOCAL for tun
> device.
> 
> What is your legitimate use case of send/receive packet to/from
> a tun device in a different netns?

One thought: run openvpn in the master netns, but put its tun0
interface into an application's netns.  Per-application VPN,
essentially?  Or maybe that's not how people do this kind of thing, but
it's a thought.

Dan

^ permalink raw reply

* Re: usb/net/p54: trying to register non-static key in p54_unregister_leds
From: Johannes Berg @ 2017-09-20 19:55 UTC (permalink / raw)
  To: Christian Lamparter, Andrey Konovalov
  Cc: Kalle Valo, linux-wireless, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, syzkaller, Stephen Boyd, Tejun Heo, Yong Zhang
In-Reply-To: <2277141.bYDD1vAb9W@debian64>

On Wed, 2017-09-20 at 21:27 +0200, Christian Lamparter wrote:

> It seems this is caused as a result of:
>     -> lock_map_acquire(&work->lockdep_map);
> 	    lock_map_release(&work->lockdep_map);
> 
>     in flush_work() [0]

Agree.

> This was added by:
> 
> 	commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> 	Author: Stephen Boyd <sboyd@codeaurora.org>
> 	Date:   Fri Apr 20 17:28:50 2012 -0700
> 
> 	workqueue: Catch more locking problems with flush_work()

Yes, but that doesn't matter.
    
> Looking at the Stephen's patch, it's clear that it was made
> with "static DECLARE_WORK(work, my_work)" in mind. However
> p54's led_work is "per-device", hence it is stored in the
> devices context p54_common, which is dynamically allocated.
> So, maybe revert Stephen's patch?

I disagree - as the lockdep warning says:

> > INFO: trying to register non-static key.
> > the code is fine but needs lockdep annotation.
> > turning off the locking correctness validator.

What it needs is to actually correctly go through initializing the work
at least once.

Without more information, I can't really say what's going on, but I
assume that something is failing and p54_unregister_leds() is getting
invoked without p54_init_leds() having been invoked, so essentially
it's trying to flush a work that was never initialized?

INIT_DELAYED_WORK() does, after all, initialize the lockdep map
properly via __INIT_WORK().

johannes

^ permalink raw reply

* Re: [PATCH net-next 08/14] gtp: Support encpasulating over IPv6
From: David Miller @ 2017-09-20 19:45 UTC (permalink / raw)
  To: tom; +Cc: tom, netdev, pablo, laforge, rohit
In-Reply-To: <CALx6S36cJEU2LkbNf-d8kUHCPj78bt5Y2Bd_rdvjxJ9epOEqAQ@mail.gmail.com>

From: Tom Herbert <tom@herbertland.com>
Date: Wed, 20 Sep 2017 11:03:52 -0700

> On Mon, Sep 18, 2017 at 9:19 PM, David Miller <davem@davemloft.net> wrote:
>> From: Tom Herbert <tom@quantonium.net>
>> Date: Mon, 18 Sep 2017 17:38:58 -0700
>>
>>> Allow peers to be specified by IPv6 addresses.
>>>
>>> Signed-off-by: Tom Herbert <tom@quantonium.net>
>>
>> Hmmm, can you just check the socket family or something like that?
> 
> I'm not sure what code you're referring to.

There is a socket associated with the tunnel to do the encapsulation
and it has an address family, right?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox