Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next 2/5] samples/bpf: Dynamically allocate structure 'syms'
From: Leo Yan @ 2018-04-19  1:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev, linux-kernel; +Cc: Leo Yan
In-Reply-To: <1524101646-6544-1-git-send-email-leo.yan@linaro.org>

Structure 'syms' is used to store kernel symbol info by reading proc fs
node '/proc/kallsyms', this structure is declared with 300000 entries
and static linked into bss section.  For most case the kernel symbols
has less than 300000 entries, so it's safe to define so large array, but
the side effect is bss section is big introduced by this structure and
it isn't flexible.

To fix this, this patch dynamically allocates memory for structure
'syms' based on parsing '/proc/kallsyms' line number at the runtime,
which can save elf file required memory significantly.

Before:
   text    data     bss     dec     hex filename
  18841    1172 5199776 5219789  4fa5cd samples/bpf/sampleip

After:
   text    data     bss     dec     hex filename
  19101    1188  399792  420081   668f1 samples/bpf/sampleip

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 samples/bpf/bpf_load.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 28e4678..c2bf7ca 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -651,8 +651,7 @@ void read_trace_pipe(void)
 	}
 }
 
-#define MAX_SYMS 300000
-static struct ksym syms[MAX_SYMS];
+static struct ksym *syms;
 static int sym_cnt;
 
 static int ksym_cmp(const void *p1, const void *p2)
@@ -678,12 +677,30 @@ int load_kallsyms(void)
 			break;
 		if (!addr)
 			continue;
+		sym_cnt++;
+	}
+
+	syms = calloc(sym_cnt, sizeof(*syms));
+	if (!syms) {
+		fclose(f);
+		return -ENOMEM;
+	}
+
+	rewind(f);
+	while (!feof(f)) {
+		if (!fgets(buf, sizeof(buf), f))
+			break;
+		if (sscanf(buf, "%p %c %s", &addr, &symbol, func) != 3)
+			break;
+		if (!addr)
+			continue;
 		syms[i].addr = (long) addr;
 		syms[i].name = strdup(func);
 		i++;
 	}
-	sym_cnt = i;
 	qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp);
+
+	fclose(f);
 	return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH bpf-next 1/5] samples/bpf: Fix typo in comment
From: Leo Yan @ 2018-04-19  1:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev, linux-kernel; +Cc: Leo Yan
In-Reply-To: <1524101646-6544-1-git-send-email-leo.yan@linaro.org>

Fix typo by replacing 'iif' with 'if'.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 samples/bpf/bpf_load.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index bebe418..28e4678 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -393,7 +393,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
 			continue;
 		if (sym[nr_maps].st_shndx != maps_shndx)
 			continue;
-		/* Only increment iif maps section */
+		/* Only increment if maps section */
 		nr_maps++;
 	}
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH bpf-next 0/5] samples/bpf: Minor fixes and cleanup
From: Leo Yan @ 2018-04-19  1:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev, linux-kernel; +Cc: Leo Yan

This patch series is minor fixes and cleanup for bpf load and samples
code.  The first one patch is typo fixing; patch 0002 is refactor for
dynamically allocate memory for kernel symbol structures; the last
three patches are mainly related with refactor with function
ksym_search(), the main benefit of this refactor is program sampleip
can be used without architecture dependency.

The patch series has been tested on ARM64 Hikey960 boards.

Leo Yan (5):
  samples/bpf: Fix typo in comment
  samples/bpf: Dynamically allocate structure 'syms'
  samples/bpf: Use NULL for failed to find symbol
  samples/bpf: Refine printing symbol for sampleip
  samples/bpf: Handle NULL pointer returned by ksym_search()

 samples/bpf/bpf_load.c         | 29 +++++++++++++++++++++++------
 samples/bpf/offwaketime_user.c |  5 +++++
 samples/bpf/sampleip_user.c    |  8 +++-----
 samples/bpf/spintest_user.c    |  5 ++++-
 samples/bpf/trace_event_user.c |  5 +++++
 5 files changed, 40 insertions(+), 12 deletions(-)

-- 
1.9.1

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 10/13] audit: add containerid support for seccomp and anom_abend records
From: Paul Moore @ 2018-04-19  1:31 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <11174597083f89352f1d6491ec94e27f882625d9.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Add container ID auxiliary records to secure computing and abnormal end
> standalone records.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  kernel/auditsc.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 7103d23..2f02ed9 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2571,6 +2571,7 @@ static void audit_log_task(struct audit_buffer *ab)
>  void audit_core_dumps(long signr)
>  {
>         struct audit_buffer *ab;
> +       struct audit_context *context = audit_alloc_local();

Looking quickly at do_coredump() I *believe* we can use current here.

>         if (!audit_enabled)
>                 return;
> @@ -2578,19 +2579,22 @@ void audit_core_dumps(long signr)
>         if (signr == SIGQUIT)   /* don't care for those */
>                 return;
>
> -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_ANOM_ABEND);
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_ANOM_ABEND);
>         if (unlikely(!ab))
>                 return;
>         audit_log_task(ab);
>         audit_log_format(ab, " sig=%ld res=1", signr);
>         audit_log_end(ab);
> +       audit_log_container_info(context, "abend", audit_get_containerid(current));
> +       audit_free_context(context);
>  }
>
>  void __audit_seccomp(unsigned long syscall, long signr, int code)
>  {
>         struct audit_buffer *ab;
> +       struct audit_context *context = audit_alloc_local();

We can definitely use current here.

> -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_SECCOMP);
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_SECCOMP);
>         if (unlikely(!ab))
>                 return;
>         audit_log_task(ab);
> @@ -2598,6 +2602,8 @@ void __audit_seccomp(unsigned long syscall, long signr, int code)
>                          signr, syscall_get_arch(), syscall,
>                          in_compat_syscall(), KSTK_EIP(current), code);
>         audit_log_end(ab);
> +       audit_log_container_info(context, "seccomp", audit_get_containerid(current));
> +       audit_free_context(context);
>  }
>
>  struct list_head *audit_killed_trees(void)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 09/13] audit: add containerid support for config/feature/user records
From: Paul Moore @ 2018-04-19  1:27 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <c34a7a95eb045a62e2443457979db9d7afbd9aee.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Add container ID auxiliary records to configuration change, feature set change
> and user generated standalone records.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  kernel/audit.c       | 50 ++++++++++++++++++++++++++++++++++++++++----------
>  kernel/auditfilter.c |  5 ++++-
>  2 files changed, 44 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index b238be5..08662b4 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -400,8 +400,9 @@ static int audit_log_config_change(char *function_name, u32 new, u32 old,
>  {
>         struct audit_buffer *ab;
>         int rc = 0;
> +       struct audit_context *context = audit_alloc_local();

We should be able to use current->audit_context here right?  If we
can't for every caller, perhaps we pass an audit_context as an
argument and only allocate a local context when the passed
audit_context is NULL.

Also, if you're not comfortable always using current, just pass the
audit_context as you do with audit_log_common_recv_msg().

> -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
>         if (unlikely(!ab))
>                 return rc;
>         audit_log_format(ab, "%s=%u old=%u", function_name, new, old);
> @@ -411,6 +412,8 @@ static int audit_log_config_change(char *function_name, u32 new, u32 old,
>                 allow_changes = 0; /* Something weird, deny request */
>         audit_log_format(ab, " res=%d", allow_changes);
>         audit_log_end(ab);
> +       audit_log_container_info(context, "config", audit_get_containerid(current));
> +       audit_free_context(context);
>         return rc;
>  }
>
> @@ -1058,7 +1061,8 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
>         return err;
>  }
>
> -static void audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
> +static void audit_log_common_recv_msg(struct audit_context *context,
> +                                     struct audit_buffer **ab, u16 msg_type)
>  {
>         uid_t uid = from_kuid(&init_user_ns, current_uid());
>         pid_t pid = task_tgid_nr(current);
> @@ -1068,7 +1072,7 @@ static void audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
>                 return;
>         }
>
> -       *ab = audit_log_start(NULL, GFP_KERNEL, msg_type);
> +       *ab = audit_log_start(context, GFP_KERNEL, msg_type);
>         if (unlikely(!*ab))
>                 return;
>         audit_log_format(*ab, "pid=%d uid=%u", pid, uid);
> @@ -1097,11 +1101,12 @@ static void audit_log_feature_change(int which, u32 old_feature, u32 new_feature
>                                      u32 old_lock, u32 new_lock, int res)
>  {
>         struct audit_buffer *ab;
> +       struct audit_context *context = audit_alloc_local();

So I know based on the other patch we are currently discussing that we
can use current here ...

>         if (audit_enabled == AUDIT_OFF)
>                 return;
>
> -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_FEATURE_CHANGE);
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_FEATURE_CHANGE);
>         if (!ab)
>                 return;
>         audit_log_task_info(ab, current);
> @@ -1109,6 +1114,8 @@ static void audit_log_feature_change(int which, u32 old_feature, u32 new_feature
>                          audit_feature_names[which], !!old_feature, !!new_feature,
>                          !!old_lock, !!new_lock, res);
>         audit_log_end(ab);
> +       audit_log_container_info(context, "feature", audit_get_containerid(current));
> +       audit_free_context(context);
>  }
>
>  static int audit_set_feature(struct sk_buff *skb)
> @@ -1337,13 +1344,15 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>
>                 err = audit_filter(msg_type, AUDIT_FILTER_USER);
>                 if (err == 1) { /* match or error */
> +                       struct audit_context *context = audit_alloc_local();

I'm pretty sure we can use current here.

>                         err = 0;
>                         if (msg_type == AUDIT_USER_TTY) {
>                                 err = tty_audit_push();
>                                 if (err)
>                                         break;
>                         }
> -                       audit_log_common_recv_msg(&ab, msg_type);
> +                       audit_log_common_recv_msg(context, &ab, msg_type);
>                         if (msg_type != AUDIT_USER_TTY)
>                                 audit_log_format(ab, " msg='%.*s'",
>                                                  AUDIT_MESSAGE_TEXT_MAX,
> @@ -1359,6 +1368,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                                 audit_log_n_untrustedstring(ab, data, size);
>                         }
>                         audit_log_end(ab);
> +                       audit_log_container_info(context, "user",
> +                                                audit_get_containerid(current));
> +                       audit_free_context(context);
>                 }
>                 break;
>         case AUDIT_ADD_RULE:
> @@ -1366,9 +1378,14 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                 if (nlmsg_len(nlh) < sizeof(struct audit_rule_data))
>                         return -EINVAL;
>                 if (audit_enabled == AUDIT_LOCKED) {
> -                       audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
> +                       struct audit_context *context = audit_alloc_local();

Pretty sure current can be used here too.  In fact I think everywhere
where we are processing commands from netlink we can use current as I
believe the entire netlink stack is processed in the context of the
caller.

> +                       audit_log_common_recv_msg(context, &ab, AUDIT_CONFIG_CHANGE);
>                         audit_log_format(ab, " audit_enabled=%d res=0", audit_enabled);
>                         audit_log_end(ab);
> +                       audit_log_container_info(context, "config",
> +                                                audit_get_containerid(current));
> +                       audit_free_context(context);
>                         return -EPERM;
>                 }
>                 err = audit_rule_change(msg_type, seq, data, nlmsg_len(nlh));
> @@ -1376,17 +1393,23 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>         case AUDIT_LIST_RULES:
>                 err = audit_list_rules_send(skb, seq);
>                 break;
> -       case AUDIT_TRIM:
> +       case AUDIT_TRIM: {
> +               struct audit_context *context = audit_alloc_local();

Same.

>                 audit_trim_trees();
> -               audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
> +               audit_log_common_recv_msg(context, &ab, AUDIT_CONFIG_CHANGE);
>                 audit_log_format(ab, " op=trim res=1");
>                 audit_log_end(ab);
> +               audit_log_container_info(context, "config",
> +                                        audit_get_containerid(current));
> +               audit_free_context(context);
>                 break;
> +       }
>         case AUDIT_MAKE_EQUIV: {
>                 void *bufp = data;
>                 u32 sizes[2];
>                 size_t msglen = nlmsg_len(nlh);
>                 char *old, *new;
> +               struct audit_context *context = audit_alloc_local();

Same.

>                 err = -EINVAL;
>                 if (msglen < 2 * sizeof(u32))
> @@ -1408,7 +1431,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                 /* OK, here comes... */
>                 err = audit_tag_tree(old, new);
>
> -               audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
> +               audit_log_common_recv_msg(context, &ab, AUDIT_CONFIG_CHANGE);
>
>                 audit_log_format(ab, " op=make_equiv old=");
>                 audit_log_untrustedstring(ab, old);
> @@ -1418,6 +1441,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                 audit_log_end(ab);
>                 kfree(old);
>                 kfree(new);
> +               audit_log_container_info(context, "config",
> +                                        audit_get_containerid(current));
> +               audit_free_context(context);
>                 break;
>         }
>         case AUDIT_SIGNAL_INFO:
> @@ -1459,6 +1485,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                 struct audit_tty_status s, old;
>                 struct audit_buffer     *ab;
>                 unsigned int t;
> +               struct audit_context *context = audit_alloc_local();

Same.

>                 memset(&s, 0, sizeof(s));
>                 /* guard against past and future API changes */
> @@ -1477,12 +1504,15 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                 old.enabled = t & AUDIT_TTY_ENABLE;
>                 old.log_passwd = !!(t & AUDIT_TTY_LOG_PASSWD);
>
> -               audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
> +               audit_log_common_recv_msg(context, &ab, AUDIT_CONFIG_CHANGE);
>                 audit_log_format(ab, " op=tty_set old-enabled=%d new-enabled=%d"
>                                  " old-log_passwd=%d new-log_passwd=%d res=%d",
>                                  old.enabled, s.enabled, old.log_passwd,
>                                  s.log_passwd, !err);
>                 audit_log_end(ab);
> +               audit_log_container_info(context, "config",
> +                                        audit_get_containerid(current));
> +               audit_free_context(context);
>                 break;
>         }
>         default:
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index c4c8746..5f7f4d6 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -1109,11 +1109,12 @@ static void audit_log_rule_change(char *action, struct audit_krule *rule, int re
>         struct audit_buffer *ab;
>         uid_t loginuid = from_kuid(&init_user_ns, audit_get_loginuid(current));
>         unsigned int sessionid = audit_get_sessionid(current);
> +       struct audit_context *context = audit_alloc_local();
>
>         if (!audit_enabled)
>                 return;

Well, first I think we should be able to get rid of the local context,
but if for some reason we can't use current->audit_context then do the
allocation after the audit_enabled check.

> -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
>         if (!ab)
>                 return;
>         audit_log_format(ab, "auid=%u ses=%u" ,loginuid, sessionid);
> @@ -1122,6 +1123,8 @@ static void audit_log_rule_change(char *action, struct audit_krule *rule, int re
>         audit_log_key(ab, rule->filterkey);
>         audit_log_format(ab, " list=%d res=%d", rule->listnr, res);
>         audit_log_end(ab);
> +       audit_log_container_info(context, "config", audit_get_containerid(current));
> +       audit_free_context(context);
>  }

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net] net: qualcomm: rmnet: Fix warning seen with fill_info
From: David Miller @ 2018-04-19  1:27 UTC (permalink / raw)
  To: subashab; +Cc: netdev
In-Reply-To: <1524008400-27713-1-git-send-email-subashab@codeaurora.org>

From: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Date: Tue, 17 Apr 2018 17:40:00 -0600

> When the last rmnet device attached to a real device is removed, the
> real device is unregistered from rmnet. As a result, the real device
> lookup fails resulting in a warning when the fill_info handler is
> called as part of the rmnet device unregistration.
> 
> Fix this by returning the rmnet flags as 0 when no real device is
> present.
 ...
> Fixes: be81a85f5f87 ("net: qualcomm: rmnet: Implement fill_info")
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>

Applied, thank you.

^ permalink raw reply

* Re: [Regression] net/phy/micrel.c v4.9.94
From: Chris Ruehl @ 2018-04-19  1:21 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: f.fainelli, netdev
In-Reply-To: <20180418130203.GG31643@lunn.ch>

On Wednesday, April 18, 2018 09:02 PM, Andrew Lunn wrote:
> On Wed, Apr 18, 2018 at 02:56:01PM +0200, Andrew Lunn wrote:
>> On Wed, Apr 18, 2018 at 09:34:16AM +0800, Chris Ruehl wrote:
>>> Hello,
>>>
>>> I like to get your heads up at a regression introduced in 4.9.94
>>> commitment lead to a kernel ops and make the network unusable on my MX6DL
>>> customized board.
>>>
>>> Race condition resume is called on startup and the phy not yet initialized.
>>
>> Hi Chris
>>
>> Please could you try
>>
>> bfe72442578b ("net: phy: micrel: fix crash when statistic requested for KSZ9031 phy")
> 
> I don't think it is a complete fix. I suspect "Micrel KSZ8795",
> "Micrel KSZ886X Switch", "Micrel KSZ8061", and "Micrel KS8737" will
> still have problems.
> 
> Those four probably need a:
> 
>          .probe          = kszphy_probe,
> 
> 	Andrew
> 

Indeed I have the
[    7.385851] Micrel KSZ9031 Gigabit PHY 2188000.ethernet-1:05: attached PHY 
driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=2188000.ethernet-1:05, irq=-1)

first I rollback to a non crashing stable kernel.

As "bfe72442578b" gonna fix it I check with the next update and verify its works 
for me.

Thanks
Chris

^ permalink raw reply

* Re: [PATCH net-next] hv_netvsc: Add NetVSP v6 and v6.1 into version negotiation
From: David Miller @ 2018-04-19  1:21 UTC (permalink / raw)
  To: haiyangz, haiyangz
  Cc: netdev, kys, sthemmin, olaf, vkuznets, devel, linux-kernel
In-Reply-To: <20180417223147.23218-1-haiyangz@linuxonhyperv.com>

From: Haiyang Zhang <haiyangz@linuxonhyperv.com>
Date: Tue, 17 Apr 2018 15:31:47 -0700

> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> This patch adds the NetVSP v6 and 6.1 message structures, and includes
> these versions into NetVSC/NetVSP version negotiation process.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Applied to net-next, thank you.

^ permalink raw reply

* Re: [PATCH net-next] hv_netvsc: propogate Hyper-V friendly name into interface alias
From: David Miller @ 2018-04-19  1:20 UTC (permalink / raw)
  To: stephen; +Cc: haiyangz, netdev, sthemmin
In-Reply-To: <20180417212530.6997-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Tue, 17 Apr 2018 14:25:30 -0700

> This patch implement the 'Device Naming' feature of the Hyper-V
> network device API. In Hyper-V on the host through the GUI or PowerShell
> it is possible to enable the device naming feature which causes
> the host to make available to the guest the name of the device.
> This shows up in the RNDIS protocol as the friendly name.
> 
> The name has no particular meaning and is limited to 256 characters.
> The value can only be set via PowerShell on the host, but could
> be scripted for mass deployments. The default value is the
> string 'Network Adapter' and since that is the same for all devices
> and useless, the driver ignores it.
> 
> In Windows, the value goes into a registry key for use in SNMP
> ifAlias. For Linux, this patch puts the value in the network
> device alias property; where it is visible in ip tools and SNMP.
> 
> The host provided ifAlias is just a suggestion, and can be
> overridden by later ip commands.
> 
> Also requires exporting dev_set_alias in netdev core.
> 
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

This looks fine, applied, thanks.

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
From: Casey Schaufler @ 2018-04-19  1:15 UTC (permalink / raw)
  To: Paul Moore
  Cc: Richard Guy Briggs, cgroups, containers, linux-api,
	Linux-Audit Mailing List, linux-fsdevel, LKML, netdev, ebiederm,
	luto, jlayton, carlos, dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <CAHC9VhTz-pr-iUVv-+R3ShwEKSHDsweDGuN7255HV7Cu3ZYPEw@mail.gmail.com>

On 4/18/2018 5:46 PM, Paul Moore wrote:
> On Wed, Apr 18, 2018 at 8:41 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 4/18/2018 4:47 PM, Paul Moore wrote:
>>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>>>> Implement the proc fs write to set the audit container ID of a process,
>>>> emitting an AUDIT_CONTAINER record to document the event.
>>>> ...
>>>>
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index d258826..1b82191 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -796,6 +796,7 @@ struct task_struct {
>>>>  #ifdef CONFIG_AUDITSYSCALL
>>>>         kuid_t                          loginuid;
>>>>         unsigned int                    sessionid;
>>>> +       u64                             containerid;
>>> This one line addition to the task_struct scares me the most of
>>> anything in this patchset.  Why?  It's a field named "containerid" in
>>> a perhaps one of the most widely used core kernel structures; the
>>> possibilities for abuse are endless, and it's foolish to think we
>>> would ever be able to adequately police this.
>> If we can get the LSM infrastructure managed task blobs from
>> module stacking in ahead of this we could create a trivial security
>> module to manage this. It's not as if there aren't all sorts of
>> interactions between security modules and the audit system already.
> While yes, there are plenty of interactions between the two, it is
> possible to use audit without the LSMs and I would like to preserve
> that.  

Fair enough.

> Further, I don't want to entangle two very complicated code
> changes or make the audit container ID effort dependent on LSM
> stacking.

Also fair, although the use case for container audit IDs is
already pulling in audit, namespaces (yeah, I know it's not
necessary for a container to use namespaces) security modules
(stacked and/or namespaced), cgroups and who knows what else.

> You're a good salesman Casey, but you're not that good ;)

I have to keep the skills sharpened somehow!

OK, I'll grant that this isn't a great fit.

^ permalink raw reply

* Re: [PATCH net-next 00/19] r8169: series with further smaller improvements
From: David Miller @ 2018-04-19  1:13 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <4049e598-1b6c-bc3e-a905-178b76d7b161@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Tue, 17 Apr 2018 23:16:07 +0200

> This series includes further smaller improvements.
> 
> Then I think the basic cleanup has been done and next step would be
> preparing the switch to phylib.

Series applied, thank you.

^ permalink raw reply

* [PATCH] net: add mutex_unlock on xfrm4_protocol_register
From: sunlianwen @ 2018-04-19  1:08 UTC (permalink / raw)
  To: netdev

The function of xfrm4_protocol_register() don't release
the mutx lock, which potential cause deadlock.

Signed-off-by: Lianwen Sun <sunlw.fnst@cn.fujitsu.com>
---
 net/ipv4/xfrm4_protocol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/xfrm4_protocol.c b/net/ipv4/xfrm4_protocol.c
index 8dd0e6ab8606..1ee34edef9d2 100644
--- a/net/ipv4/xfrm4_protocol.c
+++ b/net/ipv4/xfrm4_protocol.c
@@ -240,6 +240,10 @@ int xfrm4_protocol_register(struct xfrm4_protocol *handler,
 
 	ret = 0;
 
+	mutex_unlock(&xfrm4_protocol_mutex);
+
+	return ret;
+
 err:
 	mutex_unlock(&xfrm4_protocol_mutex);
 
-- 
2.17.0
.

^ permalink raw reply related

* Re: [net-next PATCH v4 00/13] Add support for netcp driver on K2G SoC
From: David Miller @ 2018-04-19  1:07 UTC (permalink / raw)
  To: m-karicheri2
  Cc: robh+dt, mark.rutland, ssantosh, malat, w-kwok2, devicetree,
	linux-kernel, linux-arm-kernel, netdev, grygorii.strashko,
	nsekhar
In-Reply-To: <1524000642-23944-1-git-send-email-m-karicheri2@ti.com>

From: Murali Karicheri <m-karicheri2@ti.com>
Date: Tue, 17 Apr 2018 17:30:29 -0400

> K2G SoC is another variant of Keystone family of SoCs. This patch
> series add support for NetCP driver on this SoC. The QMSS found on
> K2G SoC is a cut down version of the QMSS found on other keystone
> devices with less number of queues, internal link ram etc. The patch
> series has 2 patch sets that goes into the drivers/soc and the
> rest has to be applied to net sub system. Please review and merge
> if this looks good.
> 
> K2G TRM is located at http://www.ti.com/lit/ug/spruhy8g/spruhy8g.pdf
> Thanks
> 
> The boot logs on K2G ICE board (tftp boot over Ethernet and from mmc)
> https://pastebin.ubuntu.com/p/yvZ6drFhkW/
> 
> 
> The boot logs on K2G GP board (tftp boot over Ethernet and from mmc)
> https://pastebin.ubuntu.com/p/QTr6K7s4Zp/
> 
> Also regressed boot on K2HK and K2L EVMs as we have modified GBE
> version detection logic (K2E uses same version of NetCP as in K2L.
> So regression on one of them is needed).
> 
> Boot log on K2L and K2HK EVMs are at
> https://pastebin.ubuntu.com/p/N9DBdPjbvR/ 
> 
> This series applies to net-next master branch.

Series applied, thank you.

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
From: Paul Moore @ 2018-04-19  0:46 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: simo-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	Eric Paris, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, Linux-Audit Mailing List,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, luto-DgEjT+Ai2ygdnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <32d3e7a6-36f0-571a-bb91-67f746c7eafa-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>

On Wed, Apr 18, 2018 at 8:41 PM, Casey Schaufler <casey-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org> wrote:
> On 4/18/2018 4:47 PM, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>> Implement the proc fs write to set the audit container ID of a process,
>>> emitting an AUDIT_CONTAINER record to document the event.
>>> ...
>>>
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index d258826..1b82191 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -796,6 +796,7 @@ struct task_struct {
>>>  #ifdef CONFIG_AUDITSYSCALL
>>>         kuid_t                          loginuid;
>>>         unsigned int                    sessionid;
>>> +       u64                             containerid;
>> This one line addition to the task_struct scares me the most of
>> anything in this patchset.  Why?  It's a field named "containerid" in
>> a perhaps one of the most widely used core kernel structures; the
>> possibilities for abuse are endless, and it's foolish to think we
>> would ever be able to adequately police this.
>
> If we can get the LSM infrastructure managed task blobs from
> module stacking in ahead of this we could create a trivial security
> module to manage this. It's not as if there aren't all sorts of
> interactions between security modules and the audit system already.

While yes, there are plenty of interactions between the two, it is
possible to use audit without the LSMs and I would like to preserve
that.  Further, I don't want to entangle two very complicated code
changes or make the audit container ID effort dependent on LSM
stacking.

You're a good salesman Casey, but you're not that good ;)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 07/13] audit: add container aux record to watch/tree/mark
From: Paul Moore @ 2018-04-19  0:42 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <737f914a88d048b9985984c0ce1f946c30ca374c.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Add container ID auxiliary record to mark, watch and tree rule
> configuration standalone records.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  kernel/audit_fsnotify.c |  5 ++++-
>  kernel/audit_tree.c     |  5 ++++-
>  kernel/audit_watch.c    | 33 +++++++++++++++++++--------------
>  3 files changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
> index 52f368b..18c110d 100644
> --- a/kernel/audit_fsnotify.c
> +++ b/kernel/audit_fsnotify.c
> @@ -124,10 +124,11 @@ static void audit_mark_log_rule_change(struct audit_fsnotify_mark *audit_mark, c
>  {
>         struct audit_buffer *ab;
>         struct audit_krule *rule = audit_mark->rule;
> +       struct audit_context *context = audit_alloc_local();
>
>         if (!audit_enabled)
>                 return;

Move the audit_alloc_local() after the audit_enabled check.

> -       ab = audit_log_start(NULL, GFP_NOFS, AUDIT_CONFIG_CHANGE);
> +       ab = audit_log_start(context, GFP_NOFS, AUDIT_CONFIG_CHANGE);
>         if (unlikely(!ab))
>                 return;
>         audit_log_format(ab, "auid=%u ses=%u op=%s",
> @@ -138,6 +139,8 @@ static void audit_mark_log_rule_change(struct audit_fsnotify_mark *audit_mark, c
>         audit_log_key(ab, rule->filterkey);
>         audit_log_format(ab, " list=%d res=1", rule->listnr);
>         audit_log_end(ab);
> +       audit_log_container_info(context, "config", audit_get_containerid(current));
> +       audit_free_context(context);
>  }
>
>  void audit_remove_mark(struct audit_fsnotify_mark *audit_mark)
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 67e6956..7c085be 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -496,8 +496,9 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>  static void audit_tree_log_remove_rule(struct audit_krule *rule)
>  {
>         struct audit_buffer *ab;
> +       struct audit_context *context = audit_alloc_local();

Sort of independent of the audit container ID work, but shouldn't we
have an audit_enabled check here?

> -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
>         if (unlikely(!ab))
>                 return;
>         audit_log_format(ab, "op=remove_rule");
> @@ -506,6 +507,8 @@ static void audit_tree_log_remove_rule(struct audit_krule *rule)
>         audit_log_key(ab, rule->filterkey);
>         audit_log_format(ab, " list=%d res=1", rule->listnr);
>         audit_log_end(ab);
> +       audit_log_container_info(context, "config", audit_get_containerid(current));
> +       audit_free_context(context);
>  }
>
>  static void kill_rules(struct audit_tree *tree)
> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index 9eb8b35..60d75a2 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -238,20 +238,25 @@ static struct audit_watch *audit_dupe_watch(struct audit_watch *old)
>
>  static void audit_watch_log_rule_change(struct audit_krule *r, struct audit_watch *w, char *op)
>  {
> -       if (audit_enabled) {
> -               struct audit_buffer *ab;
> -               ab = audit_log_start(NULL, GFP_NOFS, AUDIT_CONFIG_CHANGE);
> -               if (unlikely(!ab))
> -                       return;
> -               audit_log_format(ab, "auid=%u ses=%u op=%s",
> -                                from_kuid(&init_user_ns, audit_get_loginuid(current)),
> -                                audit_get_sessionid(current), op);
> -               audit_log_format(ab, " path=");
> -               audit_log_untrustedstring(ab, w->path);
> -               audit_log_key(ab, r->filterkey);
> -               audit_log_format(ab, " list=%d res=1", r->listnr);
> -               audit_log_end(ab);
> -       }
> +       struct audit_buffer *ab;
> +       struct audit_context *context = audit_alloc_local();
> +
> +       if (!audit_enabled)
> +               return;

Same as above, do the allocation after the audit_enabled check.

> +       ab = audit_log_start(context, GFP_NOFS, AUDIT_CONFIG_CHANGE);
> +       if (unlikely(!ab))
> +               return;
> +       audit_log_format(ab, "auid=%u ses=%u op=%s",
> +                        from_kuid(&init_user_ns, audit_get_loginuid(current)),
> +                        audit_get_sessionid(current), op);
> +       audit_log_format(ab, " path=");
> +       audit_log_untrustedstring(ab, w->path);
> +       audit_log_key(ab, r->filterkey);
> +       audit_log_format(ab, " list=%d res=1", r->listnr);
> +       audit_log_end(ab);
> +       audit_log_container_info(context, "config", audit_get_containerid(current));
> +       audit_free_context(context);
>  }

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
From: Casey Schaufler @ 2018-04-19  0:41 UTC (permalink / raw)
  To: Paul Moore, Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <CAHC9VhTyvxxj2e2Gn+iyW6iLLeYB7hp8a+JvfeMmJ2nUPqtEaw@mail.gmail.com>

On 4/18/2018 4:47 PM, Paul Moore wrote:
> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> Implement the proc fs write to set the audit container ID of a process,
>> emitting an AUDIT_CONTAINER record to document the event.
>> ...
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index d258826..1b82191 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -796,6 +796,7 @@ struct task_struct {
>>  #ifdef CONFIG_AUDITSYSCALL
>>         kuid_t                          loginuid;
>>         unsigned int                    sessionid;
>> +       u64                             containerid;
> This one line addition to the task_struct scares me the most of
> anything in this patchset.  Why?  It's a field named "containerid" in
> a perhaps one of the most widely used core kernel structures; the
> possibilities for abuse are endless, and it's foolish to think we
> would ever be able to adequately police this.

If we can get the LSM infrastructure managed task blobs from 
module stacking in ahead of this we could create a trivial security
module to manage this. It's not as if there aren't all sorts of
interactions between security modules and the audit system already.

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 06/13] audit: add support for non-syscall auxiliary records
From: Paul Moore @ 2018-04-19  0:39 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <ee2a945fb09a939b3c214f45e49dab6a770d83e6.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Standalone audit records have the timestamp and serial number generated
> on the fly and as such are unique, making them standalone.  This new
> function audit_alloc_local() generates a local audit context that will
> be used only for a standalone record and its auxiliary record(s).  The
> context is discarded immediately after the local associated records are
> produced.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h |  8 ++++++++
>  kernel/auditsc.c      | 20 +++++++++++++++++++-
>  2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index ed16bb6..c0b83cb 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -227,7 +227,9 @@ static inline int audit_log_container_info(struct audit_context *context,
>  /* These are defined in auditsc.c */
>                                 /* Public API */
>  extern int  audit_alloc(struct task_struct *task);
> +extern struct audit_context *audit_alloc_local(void);
>  extern void __audit_free(struct task_struct *task);
> +extern void audit_free_context(struct audit_context *context);
>  extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
>                                   unsigned long a2, unsigned long a3);
>  extern void __audit_syscall_exit(int ret_success, long ret_value);
> @@ -472,6 +474,12 @@ static inline int audit_alloc(struct task_struct *task)
>  {
>         return 0;
>  }
> +static inline struct audit_context *audit_alloc_local(void)
> +{
> +       return NULL;
> +}
> +static inline void audit_free_context(struct audit_context *context)
> +{ }
>  static inline void audit_free(struct task_struct *task)
>  { }
>  static inline void audit_syscall_entry(int major, unsigned long a0,
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 2932ef1..7103d23 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -959,8 +959,26 @@ int audit_alloc(struct task_struct *tsk)
>         return 0;
>  }
>
> -static inline void audit_free_context(struct audit_context *context)
> +struct audit_context *audit_alloc_local(void)
>  {
> +       struct audit_context *context;
> +
> +       if (!audit_ever_enabled)
> +               return NULL; /* Return if not auditing. */
> +
> +       context = audit_alloc_context(AUDIT_RECORD_CONTEXT);
> +       if (!context)
> +               return NULL;
> +       context->serial = audit_serial();
> +       context->ctime = current_kernel_time64();
> +       context->in_syscall = 1;
> +       return context;
> +}
> +
> +inline void audit_free_context(struct audit_context *context)
> +{
> +       if (!context)
> +               return;
>         audit_free_names(context);
>         unroll_tree_refs(context, NULL, 0);
>         free_tree_refs(context);

I'm reserving the option to comment on this idea further as I make my
way through the patchset, but audit_free_context() definitely
shouldn't be declared as an inline function.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net-next] net: phy: mdio-boardinfo: Allow recursive mdiobus_register()
From: Andrew Lunn @ 2018-04-19  0:33 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: David Miller, netdev, Vivien Didelot
In-Reply-To: <5966673b-f091-a180-71b1-96d35f845e17@gmail.com>

On Wed, Apr 18, 2018 at 05:14:36PM -0700, Florian Fainelli wrote:
> On 04/18/2018 05:00 PM, Andrew Lunn wrote:
> > mdiobus_register will search for any mdiobus board info registered for
> > the bus being registered. If found, it will probe devices on the bus.
> > That device, if for example it is an ethernet switch, may then try to
> > register an mdio bus. Thus we need to allow recursive calls to
> > mdiobus_register.
> > 
> > Holding the mdio_board_lock will cause a deadlock during this
> > recursion. Release the lock and use list_for_each_entry_safe.
> > 
> > Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> > ---
> >  drivers/net/phy/mdio-boardinfo.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/phy/mdio-boardinfo.c b/drivers/net/phy/mdio-boardinfo.c
> > index 1861f387820d..863496fa5d13 100644
> > --- a/drivers/net/phy/mdio-boardinfo.c
> > +++ b/drivers/net/phy/mdio-boardinfo.c
> > @@ -30,17 +30,20 @@ void mdiobus_setup_mdiodev_from_board_info(struct mii_bus *bus,
> >  					    struct mdio_board_info *bi))
> >  {
> >  	struct mdio_board_entry *be;
> > +	struct mdio_board_entry *tmp;
> >  	struct mdio_board_info *bi;
> >  	int ret;
> >  
> >  	mutex_lock(&mdio_board_lock);
> 
> Don't you need to drop that lock here?
> 
> > -	list_for_each_entry(be, &mdio_board_list, list) {
> > +	list_for_each_entry_safe(be, tmp, &mdio_board_list, list) {
> >  		bi = &be->board_info;
> >  
> >  		if (strcmp(bus->id, bi->bus_id))
> >  			continue;
> >  
> > +		mutex_unlock(&mdio_board_lock);
> >  		ret = cb(bus, bi);
> > +		mutex_lock(&mdio_board_lock);
> >  		if (ret)
> >  			continue;
> >  
> 
> And conversely drop the unlock from the end of this function?

No. The recursion happens inside the ret = cb(bus, bi). We need the
lock to be available during that. The lock itself is protecting the
list, so we need to hold the lock while using the list.

> Also, would you rather target "net" for this change since this appears
> to be a bug fix?

As far as i know, there is no in tree driver which can trigger
this. It requires the use of a switch instantiated using mdio_board
info, and the switch then needs to add another mdio bus in its probe
function. The only in tree code doing anything like this is dsa_loop,
and it does not register an mdio bus.

However, later this cycle i plan to add support for Zodiac's SCU3, and
it does trigger this deadlock. So net-next is sufficient for my.

   Andrew

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 05/13] audit: add containerid support for ptrace and signals
From: Paul Moore @ 2018-04-19  0:32 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <8c7ff567377f4a83edac48e962c1b5b824b523c8.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Add container ID support to ptrace and signals.  In particular, the "op"
> field provides a way to label the auxiliary record to which it is
> associated.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h | 16 +++++++++++-----
>  kernel/audit.c        | 12 ++++++++----
>  kernel/audit.h        |  2 ++
>  kernel/auditsc.c      | 19 +++++++++++++++----
>  4 files changed, 36 insertions(+), 13 deletions(-)

...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index a12f21f..b238be5 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -142,6 +142,7 @@ struct audit_net {
>  kuid_t         audit_sig_uid = INVALID_UID;
>  pid_t          audit_sig_pid = -1;
>  u32            audit_sig_sid = 0;
> +u64            audit_sig_cid = INVALID_CID;
>
>  /* Records can be lost in several ways:
>     0) [suppressed in audit_alloc]
> @@ -1438,6 +1439,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                         memcpy(sig_data->ctx, ctx, len);
>                         security_release_secctx(ctx, len);
>                 }
> +               sig_data->cid = audit_sig_cid;
>                 audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO, 0, 0,
>                                  sig_data, sizeof(*sig_data) + len);
>                 kfree(sig_data);
> @@ -2051,20 +2053,22 @@ void audit_log_session_info(struct audit_buffer *ab)
>
>  /*
>   * audit_log_container_info - report container info
> - * @tsk: task to be recorded
>   * @context: task or local context for record
> + * @op: containerid string description
> + * @containerid: container ID to report
>   */
> -int audit_log_container_info(struct task_struct *tsk, struct audit_context *context)
> +int audit_log_container_info(struct audit_context *context,
> +                             char *op, u64 containerid)
>  {
>         struct audit_buffer *ab;
>
> -       if (!audit_containerid_set(tsk))
> +       if (!cid_valid(containerid))
>                 return 0;
>         /* Generate AUDIT_CONTAINER_INFO with container ID */
>         ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_INFO);
>         if (!ab)
>                 return -ENOMEM;
> -       audit_log_format(ab, "contid=%llu", audit_get_containerid(tsk));
> +       audit_log_format(ab, "op=%s contid=%llu", op, containerid);
>         audit_log_end(ab);
>         return 0;
>  }

Let's get these changes into the first patch where
audit_log_container_info() is defined.  Why?  This inserts a new field
into the record which is a no-no.  Yes, it is one single patchset, but
they are still separate patches and who knows which patches a given
distribution and/or tree may decide to backport.

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 2bba324..2932ef1 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -113,6 +113,7 @@ struct audit_aux_data_pids {
>         kuid_t                  target_uid[AUDIT_AUX_PIDS];
>         unsigned int            target_sessionid[AUDIT_AUX_PIDS];
>         u32                     target_sid[AUDIT_AUX_PIDS];
> +       u64                     target_cid[AUDIT_AUX_PIDS];
>         char                    target_comm[AUDIT_AUX_PIDS][TASK_COMM_LEN];
>         int                     pid_count;
>  };
> @@ -1422,21 +1423,27 @@ static void audit_log_exit(struct audit_context *context, struct task_struct *ts
>         for (aux = context->aux_pids; aux; aux = aux->next) {
>                 struct audit_aux_data_pids *axs = (void *)aux;
>
> -               for (i = 0; i < axs->pid_count; i++)
> +               for (i = 0; i < axs->pid_count; i++) {
> +                       char axsn[sizeof("aux0xN ")];
> +
> +                       sprintf(axsn, "aux0x%x", i);
>                         if (audit_log_pid_context(context, axs->target_pid[i],
>                                                   axs->target_auid[i],
>                                                   axs->target_uid[i],
>                                                   axs->target_sessionid[i],
>                                                   axs->target_sid[i],
> -                                                 axs->target_comm[i]))
> +                                                 axs->target_comm[i])
> +                           && audit_log_container_info(context, axsn, axs->target_cid[i]))

Shouldn't this be an OR instead of an AND?

>                                 call_panic = 1;
> +               }
>         }
>
>         if (context->target_pid &&
>             audit_log_pid_context(context, context->target_pid,
>                                   context->target_auid, context->target_uid,
>                                   context->target_sessionid,
> -                                 context->target_sid, context->target_comm))
> +                                 context->target_sid, context->target_comm)
> +           && audit_log_container_info(context, "target", context->target_cid))

Same question.

>                         call_panic = 1;
>
>         if (context->pwd.dentry && context->pwd.mnt) {

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 04/13] audit: add containerid filtering
From: Paul Moore @ 2018-04-19  0:24 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <b933f93762435990e9b1e6d5aebf15f186ac8951.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Implement container ID filtering using the AUDIT_CONTAINERID field name
> to send an 8-character string representing a u64 since the value field
> is only u32.
>
> Sending it as two u32 was considered, but gathering and comparing two
> fields was more complex.

My only worry here is that you aren't really sending a string in the
ASCII sense, you are sending an 8 byte buffer (that better be NUL
terminated) that happens to be an unsigned 64-bit integer.  To be
clear, I'm okay with that (it's protected by AUDIT_CONTAINERID), and
the code is okay with that, I just want us to pause for a minute and
make sure that is an okay thing to do long term.

> The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID_FILTER.
>
> This requires support from userspace to be useful.
> See: https://github.com/linux-audit/audit-userspace/issues/40
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h      |  1 +
>  include/uapi/linux/audit.h |  5 ++++-
>  kernel/audit.h             |  1 +
>  kernel/auditfilter.c       | 47 ++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/auditsc.c           |  3 +++
>  5 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 3acbe9d..f10ca1b 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -76,6 +76,7 @@ struct audit_field {
>         u32                             type;
>         union {
>                 u32                     val;
> +               u64                     val64;
>                 kuid_t                  uid;
>                 kgid_t                  gid;
>                 struct {
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index e83ccbd..8443a8f 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -262,6 +262,7 @@
>  #define AUDIT_LOGINUID_SET     24
>  #define AUDIT_SESSIONID        25      /* Session ID */
>  #define AUDIT_FSTYPE   26      /* FileSystem Type */
> +#define AUDIT_CONTAINERID      27      /* Container ID */
>
>                                 /* These are ONLY useful when checking
>                                  * at syscall exit time (AUDIT_AT_EXIT). */
> @@ -342,6 +343,7 @@ enum {
>  #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
>  #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
>  #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> +#define AUDIT_FEATURE_BITMAP_CONTAINERID_FILTER        0x00000080
>
>  #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
>                                   AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> @@ -349,7 +351,8 @@ enum {
>                                   AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
>                                   AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
>                                   AUDIT_FEATURE_BITMAP_LOST_RESET | \
> -                                 AUDIT_FEATURE_BITMAP_FILTER_FS)
> +                                 AUDIT_FEATURE_BITMAP_FILTER_FS | \
> +                                 AUDIT_FEATURE_BITMAP_CONTAINERID_FILTER)
>
>  /* deprecated: AUDIT_VERSION_* */
>  #define AUDIT_VERSION_LATEST           AUDIT_FEATURE_BITMAP_ALL
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 214e149..aaa651a 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -234,6 +234,7 @@ static inline int audit_hash_ino(u32 ino)
>
>  extern int audit_match_class(int class, unsigned syscall);
>  extern int audit_comparator(const u32 left, const u32 op, const u32 right);
> +extern int audit_comparator64(const u64 left, const u32 op, const u64 right);
>  extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right);
>  extern int audit_gid_comparator(kgid_t left, u32 op, kgid_t right);
>  extern int parent_len(const char *path);
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index d7a807e..c4c8746 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -410,6 +410,7 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f)
>         /* FALL THROUGH */
>         case AUDIT_ARCH:
>         case AUDIT_FSTYPE:
> +       case AUDIT_CONTAINERID:
>                 if (f->op != Audit_not_equal && f->op != Audit_equal)
>                         return -EINVAL;
>                 break;
> @@ -584,6 +585,14 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
>                         }
>                         entry->rule.exe = audit_mark;
>                         break;
> +               case AUDIT_CONTAINERID:
> +                       if (f->val != sizeof(u64))
> +                               goto exit_free;
> +                       str = audit_unpack_string(&bufp, &remain, f->val);
> +                       if (IS_ERR(str))
> +                               goto exit_free;
> +                       f->val64 = ((u64 *)str)[0];
> +                       break;
>                 }
>         }
>
> @@ -666,6 +675,11 @@ static struct audit_rule_data *audit_krule_to_data(struct audit_krule *krule)
>                         data->buflen += data->values[i] =
>                                 audit_pack_string(&bufp, audit_mark_path(krule->exe));
>                         break;
> +               case AUDIT_CONTAINERID:
> +                       data->buflen += data->values[i] = sizeof(u64);
> +                       for (i = 0; i < sizeof(u64); i++)
> +                               ((char *)bufp)[i] = ((char *)&f->val64)[i];
> +                       break;
>                 case AUDIT_LOGINUID_SET:
>                         if (krule->pflags & AUDIT_LOGINUID_LEGACY && !f->val) {
>                                 data->fields[i] = AUDIT_LOGINUID;
> @@ -752,6 +766,10 @@ static int audit_compare_rule(struct audit_krule *a, struct audit_krule *b)
>                         if (!gid_eq(a->fields[i].gid, b->fields[i].gid))
>                                 return 1;
>                         break;
> +               case AUDIT_CONTAINERID:
> +                       if (a->fields[i].val64 != b->fields[i].val64)
> +                               return 1;
> +                       break;
>                 default:
>                         if (a->fields[i].val != b->fields[i].val)
>                                 return 1;
> @@ -1210,6 +1228,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
>         }
>  }
>
> +int audit_comparator64(u64 left, u32 op, u64 right)
> +{
> +       switch (op) {
> +       case Audit_equal:
> +               return (left == right);
> +       case Audit_not_equal:
> +               return (left != right);
> +       case Audit_lt:
> +               return (left < right);
> +       case Audit_le:
> +               return (left <= right);
> +       case Audit_gt:
> +               return (left > right);
> +       case Audit_ge:
> +               return (left >= right);
> +       case Audit_bitmask:
> +               return (left & right);
> +       case Audit_bittest:
> +               return ((left & right) == right);
> +       default:
> +               BUG();
> +               return 0;
> +       }
> +}
> +
>  int audit_uid_comparator(kuid_t left, u32 op, kuid_t right)
>  {
>         switch (op) {
> @@ -1348,6 +1391,10 @@ int audit_filter(int msgtype, unsigned int listtype)
>                                 result = audit_comparator(audit_loginuid_set(current),
>                                                           f->op, f->val);
>                                 break;
> +                       case AUDIT_CONTAINERID:
> +                               result = audit_comparator64(audit_get_containerid(current),
> +                                                             f->op, f->val64);
> +                               break;
>                         case AUDIT_MSGTYPE:
>                                 result = audit_comparator(msgtype, f->op, f->val);
>                                 break;
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 65be110..2bba324 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -614,6 +614,9 @@ static int audit_filter_rules(struct task_struct *tsk,
>                 case AUDIT_LOGINUID_SET:
>                         result = audit_comparator(audit_loginuid_set(tsk), f->op, f->val);
>                         break;
> +               case AUDIT_CONTAINERID:
> +                       result = audit_comparator64(audit_get_containerid(tsk), f->op, f->val64);
> +                       break;
>                 case AUDIT_SUBJ_USER:
>                 case AUDIT_SUBJ_ROLE:
>                 case AUDIT_SUBJ_TYPE:
> --
> 1.8.3.1
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit



-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net-next] net: phy: mdio-boardinfo: Allow recursive mdiobus_register()
From: Florian Fainelli @ 2018-04-19  0:14 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: netdev, Vivien Didelot
In-Reply-To: <1524096047-16823-1-git-send-email-andrew@lunn.ch>

On 04/18/2018 05:00 PM, Andrew Lunn wrote:
> mdiobus_register will search for any mdiobus board info registered for
> the bus being registered. If found, it will probe devices on the bus.
> That device, if for example it is an ethernet switch, may then try to
> register an mdio bus. Thus we need to allow recursive calls to
> mdiobus_register.
> 
> Holding the mdio_board_lock will cause a deadlock during this
> recursion. Release the lock and use list_for_each_entry_safe.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/phy/mdio-boardinfo.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/mdio-boardinfo.c b/drivers/net/phy/mdio-boardinfo.c
> index 1861f387820d..863496fa5d13 100644
> --- a/drivers/net/phy/mdio-boardinfo.c
> +++ b/drivers/net/phy/mdio-boardinfo.c
> @@ -30,17 +30,20 @@ void mdiobus_setup_mdiodev_from_board_info(struct mii_bus *bus,
>  					    struct mdio_board_info *bi))
>  {
>  	struct mdio_board_entry *be;
> +	struct mdio_board_entry *tmp;
>  	struct mdio_board_info *bi;
>  	int ret;
>  
>  	mutex_lock(&mdio_board_lock);

Don't you need to drop that lock here?

> -	list_for_each_entry(be, &mdio_board_list, list) {
> +	list_for_each_entry_safe(be, tmp, &mdio_board_list, list) {
>  		bi = &be->board_info;
>  
>  		if (strcmp(bus->id, bi->bus_id))
>  			continue;
>  
> +		mutex_unlock(&mdio_board_lock);
>  		ret = cb(bus, bi);
> +		mutex_lock(&mdio_board_lock);
>  		if (ret)
>  			continue;
>  

And conversely drop the unlock from the end of this function?

Also, would you rather target "net" for this change since this appears
to be a bug fix?
-- 
Florian

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 02/13] audit: check children and threading before allowing containerid
From: Paul Moore @ 2018-04-19  0:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: simo-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	Eric Paris, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	Linux-Audit Mailing List, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	luto-DgEjT+Ai2ygdnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <995b77557010b2f9aed0e10435f7b8536df7a5db.1521179281.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Check if a task has existing children or co-threads and refuse to set
> the container ID if either are present.  Failure to check this could
> permit games where a child scratches its parent's back to work around
> inheritance and double-setting policy.
>
> Signed-off-by: Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/auditsc.c | 4 ++++
>  1 file changed, 4 insertions(+)

I would just include this in patch 1/2 as I can't think of world where
we wouldn't this check.

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 29c8482..a6b0a52 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2087,6 +2087,10 @@ static int audit_set_containerid_perm(struct task_struct *task, u64 containerid)
>         /* if we don't have caps, reject */
>         if (!capable(CAP_AUDIT_CONTROL))
>                 return -EPERM;
> +       /* if task has children or is not single-threaded, deny */
> +       if (!list_empty(&task->children) ||
> +           !(thread_group_leader(task) && thread_group_empty(task)))
> +               return -EPERM;
>         /* if containerid is unset, allow */
>         if (!audit_containerid_set(task))
>                 return 0;
> --
> 1.8.3.1

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH iproute2 net-next] vxlan: fix ttl inherit behavior
From: Hangbin Liu @ 2018-04-19  0:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: network dev
In-Reply-To: <20180418085016.7209369d@xeon-e3>

On Wed, Apr 18, 2018 at 08:50:16AM -0700, Stephen Hemminger wrote:
> When davem  merges the feature into net-next, dsa will merge this into iproute2-next.
> We hold off merging into iproute2 because often the kernel review feedback causes
> API changes.

Got it, Thanks.

Regards
Hangbin

^ permalink raw reply

* [PATCH net-next] net: phy: mdio-boardinfo: Allow recursive mdiobus_register()
From: Andrew Lunn @ 2018-04-19  0:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Florian Fainelli, Vivien Didelot, Andrew Lunn

mdiobus_register will search for any mdiobus board info registered for
the bus being registered. If found, it will probe devices on the bus.
That device, if for example it is an ethernet switch, may then try to
register an mdio bus. Thus we need to allow recursive calls to
mdiobus_register.

Holding the mdio_board_lock will cause a deadlock during this
recursion. Release the lock and use list_for_each_entry_safe.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/mdio-boardinfo.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/mdio-boardinfo.c b/drivers/net/phy/mdio-boardinfo.c
index 1861f387820d..863496fa5d13 100644
--- a/drivers/net/phy/mdio-boardinfo.c
+++ b/drivers/net/phy/mdio-boardinfo.c
@@ -30,17 +30,20 @@ void mdiobus_setup_mdiodev_from_board_info(struct mii_bus *bus,
 					    struct mdio_board_info *bi))
 {
 	struct mdio_board_entry *be;
+	struct mdio_board_entry *tmp;
 	struct mdio_board_info *bi;
 	int ret;
 
 	mutex_lock(&mdio_board_lock);
-	list_for_each_entry(be, &mdio_board_list, list) {
+	list_for_each_entry_safe(be, tmp, &mdio_board_list, list) {
 		bi = &be->board_info;
 
 		if (strcmp(bus->id, bi->bus_id))
 			continue;
 
+		mutex_unlock(&mdio_board_lock);
 		ret = cb(bus, bi);
+		mutex_lock(&mdio_board_lock);
 		if (ret)
 			continue;
 
-- 
2.17.0

^ permalink raw reply related

* Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
From: Paul Moore @ 2018-04-18 23:47 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <e284617ad667ad8f17958dd8babb87fe1b4d7205.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Implement the proc fs write to set the audit container ID of a process,
> emitting an AUDIT_CONTAINER record to document the event.
>
> This is a write from the container orchestrator task to a proc entry of
> the form /proc/PID/containerid where PID is the process ID of the newly
> created task that is to become the first task in a container, or an
> additional task added to a container.
>
> The write expects up to a u64 value (unset: 18446744073709551615).
>
> This will produce a record such as this:
> type=CONTAINER msg=audit(1519903238.968:261): op=set pid=596 uid=0 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0 ses=1 opid=596 old-contid=18446744073709551615 contid=123455 res=0
>
> The "op" field indicates an initial set.  The "pid" to "ses" fields are
> the orchestrator while the "opid" field is the object's PID, the process
> being "contained".  Old and new container ID values are given in the
> "contid" fields, while res indicates its success.
>
> It is not permitted to self-set, unset or re-set the container ID.  A
> child inherits its parent's container ID, but then can be set only once
> after.
>
> See: https://github.com/linux-audit/audit-kernel/issues/32
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  fs/proc/base.c             | 37 ++++++++++++++++++++
>  include/linux/audit.h      | 16 +++++++++
>  include/linux/init_task.h  |  4 ++-
>  include/linux/sched.h      |  1 +
>  include/uapi/linux/audit.h |  2 ++
>  kernel/auditsc.c           | 84 ++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 143 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 60316b5..6ce4fbe 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1299,6 +1299,41 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
>         .read           = proc_sessionid_read,
>         .llseek         = generic_file_llseek,
>  };
> +
> +static ssize_t proc_containerid_write(struct file *file, const char __user *buf,
> +                                  size_t count, loff_t *ppos)
> +{
> +       struct inode *inode = file_inode(file);
> +       u64 containerid;
> +       int rv;
> +       struct task_struct *task = get_proc_task(inode);
> +
> +       if (!task)
> +               return -ESRCH;
> +       if (*ppos != 0) {
> +               /* No partial writes. */
> +               put_task_struct(task);
> +               return -EINVAL;
> +       }
> +
> +       rv = kstrtou64_from_user(buf, count, 10, &containerid);
> +       if (rv < 0) {
> +               put_task_struct(task);
> +               return rv;
> +       }
> +
> +       rv = audit_set_containerid(task, containerid);
> +       put_task_struct(task);
> +       if (rv < 0)
> +               return rv;
> +       return count;
> +}
> +
> +static const struct file_operations proc_containerid_operations = {
> +       .write          = proc_containerid_write,
> +       .llseek         = generic_file_llseek,
> +};
> +
>  #endif
>
>  #ifdef CONFIG_FAULT_INJECTION
> @@ -2961,6 +2996,7 @@ static int proc_pid_patch_state(struct seq_file *m, struct pid_namespace *ns,
>  #ifdef CONFIG_AUDITSYSCALL
>         REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
>         REG("sessionid",  S_IRUGO, proc_sessionid_operations),
> +       REG("containerid", S_IWUSR, proc_containerid_operations),
>  #endif
>  #ifdef CONFIG_FAULT_INJECTION
>         REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> @@ -3355,6 +3391,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
>  #ifdef CONFIG_AUDITSYSCALL
>         REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
>         REG("sessionid",  S_IRUGO, proc_sessionid_operations),
> +       REG("containerid", S_IWUSR, proc_containerid_operations),
>  #endif
>  #ifdef CONFIG_FAULT_INJECTION
>         REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index af410d9..fe4ba3f 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -29,6 +29,7 @@
>
>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>  #define AUDIT_DEV_UNSET ((dev_t)-1)
> +#define INVALID_CID AUDIT_CID_UNSET

Why can't we just use AUDIT_CID_UNSET?  Is there an important
distinction?  If so, they shouldn't they have different values?

If we do need to keep INVALID_CID, let's rename it to
AUDIT_CID_INVALID so we have some consistency to the naming patterns
and we stress that it is an *audit* container ID.

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d258826..1b82191 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -796,6 +796,7 @@ struct task_struct {
>  #ifdef CONFIG_AUDITSYSCALL
>         kuid_t                          loginuid;
>         unsigned int                    sessionid;
> +       u64                             containerid;

This one line addition to the task_struct scares me the most of
anything in this patchset.  Why?  It's a field named "containerid" in
a perhaps one of the most widely used core kernel structures; the
possibilities for abuse are endless, and it's foolish to think we
would ever be able to adequately police this.

Unfortunately, we can't add the field to audit_context as things
currently stand because we don't always allocate an audit_context,
it's dependent on the system's configuration, and we need to track the
audit container ID for a given process, regardless of the audit
configuration.  Pretty much the same reason why loginuid and sessionid
are located directly in task_struct now.  As I stressed during the
design phase, I really want to keep this as an *audit* container ID
and not a general purpose kernel wide container ID.  If the kernel
ever grows a general purpose container ID token, I'll be the first in
line to convert the audit code, but I don't want audit to be that
general purpose mechanism ... audit is hated enough as-is ;)

I think the right solution to this is to create another new struct,
audit_task_info (or similar, the name really isn't that important),
which would be stored as a pointer in task_struct and would replace
the audit_context pointer, loginuid, sessionid, and the newly proposed
containerid.  The new audit_task_info would always be allocated in the
audit_alloc() function (please use kmem_cache), and the audit_context
pointer included inside would continue to be allocated based on the
existing conditions.  By keeping audit_task_info as a pointer inside
task_struct we could hide the structure definition inside
kernel/audit*.c and make it much more difficult for other subsystems
to abuse it.[1]

  struct audit_task_info {
    kuid_t loginuid;
    unsigned int sessionid;
    u64 containerid;
    struct audit_context *ctx;
  }

Actually, we might even want to consider storing audit_context in
audit_task_info (no pointer), or making it a zero length array
(ctx[0]) and going with a variable sized allocation of audit_task_info
... but all that could be done as a follow up optimization once we get
the basic idea sorted.

[1] If for some reason allocating audit_task_info becomes too much
overhead to bear (somewhat doubtful since we would only do it at task
creation), we could do some ugly tricks to directly include an
audit_task_struct chunk in task_struct but I'd like to avoid that if
possible (and I think we can).

>  #endif
>         struct seccomp                  seccomp;

...

> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 4e61a9e..921a71f 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -71,6 +71,7 @@
>  #define AUDIT_TTY_SET          1017    /* Set TTY auditing status */
>  #define AUDIT_SET_FEATURE      1018    /* Turn an audit feature on or off */
>  #define AUDIT_GET_FEATURE      1019    /* Get which features are enabled */
> +#define AUDIT_CONTAINER                1020    /* Define the container id and information */
>
>  #define AUDIT_FIRST_USER_MSG   1100    /* Userspace messages mostly uninteresting to kernel */
>  #define AUDIT_USER_AVC         1107    /* We filter this differently */
> @@ -465,6 +466,7 @@ struct audit_tty_status {
>  };
>
>  #define AUDIT_UID_UNSET (unsigned int)-1
> +#define AUDIT_CID_UNSET ((u64)-1)

I think we need to decide if we want to distinguish between the "host"
(e.g. init ns) and "unset".  Looking at this patch (I've only quickly
skimmed the others so far) it would appear that you don't think we
need to worry about this distinction; that's fine, but let's make it
explicit with a comment in the code that AUDIT_CID_UNSET means "unset"
as well as "host".

If we do need to make a distinction, let's add a constant/macro for "host".

>  /* audit_rule_data supports filter rules with both integer and string
>   * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 4e0a4ac..29c8482 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2073,6 +2073,90 @@ int audit_set_loginuid(kuid_t loginuid)
>         return rc;
>  }
>
> +static int audit_set_containerid_perm(struct task_struct *task, u64 containerid)
> +{
> +       struct task_struct *parent;
> +       u64 pcontainerid, ccontainerid;
> +
> +       /* Don't allow to set our own containerid */
> +       if (current == task)
> +               return -EPERM;

Why not?  Is there some obvious security concern that I missing?

I ask because I suppose it might be possible for some container
runtime to do a fork, setup some of the environment and them exec the
container (before you answer the obvious "namespaces!" please remember
we're not trying to define containers).

> +       /* Don't allow the containerid to be unset */
> +       if (!cid_valid(containerid))
> +               return -EINVAL;
> +       /* if we don't have caps, reject */
> +       if (!capable(CAP_AUDIT_CONTROL))
> +               return -EPERM;
> +       /* if containerid is unset, allow */
> +       if (!audit_containerid_set(task))
> +               return 0;
> +       /* it is already set, and not inherited from the parent, reject */
> +       ccontainerid = audit_get_containerid(task);
> +       rcu_read_lock();
> +       parent = rcu_dereference(task->real_parent);
> +       rcu_read_unlock();
> +       task_lock(parent);
> +       pcontainerid = audit_get_containerid(parent);
> +       task_unlock(parent);
> +       if (ccontainerid != pcontainerid)
> +               return -EPERM;
> +       return 0;
> +}
> +
> +static void audit_log_set_containerid(struct task_struct *task, u64 oldcontainerid,
> +                                     u64 containerid, int rc)
> +{
> +       struct audit_buffer *ab;
> +       uid_t uid;
> +       struct tty_struct *tty;
> +
> +       if (!audit_enabled)
> +               return;
> +
> +       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONTAINER);
> +       if (!ab)
> +               return;
> +
> +       uid = from_kuid(&init_user_ns, task_uid(current));
> +       tty = audit_get_tty(current);
> +
> +       audit_log_format(ab, "op=set pid=%d uid=%u", task_tgid_nr(current), uid);
> +       audit_log_task_context(ab);
> +       audit_log_format(ab, " auid=%u tty=%s ses=%u opid=%d old-contid=%llu contid=%llu res=%d",
> +                        from_kuid(&init_user_ns, audit_get_loginuid(current)),
> +                        tty ? tty_name(tty) : "(none)", audit_get_sessionid(current),
> +                        task_tgid_nr(task), oldcontainerid, containerid, !rc);
> +
> +       audit_put_tty(tty);
> +       audit_log_end(ab);
> +}
> +
> +/**
> + * audit_set_containerid - set current task's audit_context containerid
> + * @containerid: containerid value
> + *
> + * Returns 0 on success, -EPERM on permission failure.
> + *
> + * Called (set) from fs/proc/base.c::proc_containerid_write().
> + */
> +int audit_set_containerid(struct task_struct *task, u64 containerid)
> +{
> +       u64 oldcontainerid;
> +       int rc;
> +
> +       oldcontainerid = audit_get_containerid(task);
> +
> +       rc = audit_set_containerid_perm(task, containerid);
> +       if (!rc) {
> +               task_lock(task);
> +               task->containerid = containerid;
> +               task_unlock(task);
> +       }
> +
> +       audit_log_set_containerid(task, oldcontainerid, containerid, rc);
> +       return rc;

Why are audit_set_containerid_perm() and audit_log_containerid()
separate functions?

-- 
paul moore
www.paul-moore.com

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox