From: ebiederm@xmission.com (Eric W. Biederman)
To: zzoru <zzoru007@gmail.com>
Cc: davem@davemloft.net, ktkhai@virtuozzo.com, avagin@virtuozzo.com,
dsahern@gmail.com, nicolas.dichtel@6wind.com,
tyhicks@canonical.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, syzkaller@googlegroups.com
Subject: Re: net/core: BUG in copy_net_ns()
Date: Fri, 11 Jan 2019 14:33:47 -0600 [thread overview]
Message-ID: <87fttzaq8k.fsf@xmission.com> (raw)
In-Reply-To: <bc63c776-99f9-ca6e-0e81-f91b1448932b@gmail.com> (zzoru's message of "Sat, 12 Jan 2019 03:07:33 +0900")
zzoru <zzoru007@gmail.com> writes:
> net/core: BUG in copy_net_ns() (net_namespace.c)
I don't understand this failure report at all.
I don't see the connection to copy_net_ns(). And I don't see how the
suggested patch short of covering up a memory stomp could possibly make
a difference.
What am I missing?
> Hello,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit 1bdbe227492075d058e37cb3d400e6468d0095b5
>
> Syzkaller hit 'WARNING in __alloc_pages_slowpath' bug.
>
> syz-executor561 (17453) used greatest stack depth: 25056 bytes left
> WARNING: CPU: 0 PID: 692 at mm/page_alloc.c:4415
> __alloc_pages_slowpath+0x1cb1/0x2220 mm/page_alloc.c:4386
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 692 Comm: kswapd0 Not tainted 5.0.0-rc1+ #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xca/0x13e lib/dump_stack.c:113
> panic+0x278/0x5bf kernel/panic.c:214
> __warn.cold.10+0x20/0x45 kernel/panic.c:571
> report_bug+0x246/0x2d0 lib/bug.c:186
> fixup_bug arch/x86/kernel/traps.c:178 [inline]
> do_error_trap+0x123/0x1e0 arch/x86/kernel/traps.c:271
> do_invalid_op+0x31/0x40 arch/x86/kernel/traps.c:290
> invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
> RIP: 0010:__alloc_pages_slowpath+0x1cb1/0x2220 mm/page_alloc.c:4415
> Code: 8b 84 24 a8 00 00 00 e9 ea f1 ff ff 85 d2 0f 85 0b 01 00 00 48 c7
> c7 c0 5e 55 84 e8 79 f8 23 02 e9 86 f9 ff ff 44 8b 74 24 0c <0f> 0b 48
> b8 00 00 00 00 00 fc ff df 48 8b 54 24 18 48 c1 ea 03 80
> RSP: 0018:ffff8880683fedb8 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff1100d07fda4
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88807ffdd528
> RBP: dffffc0000000000 R08: 0000000000000000 R09: 000000000000067a
> R10: 0000000000000000 R11: ffff88807ffdc487 R12: 0000000000000000
> R13: ffff8880683ff010 R14: 0000000000415a00 R15: ffff8880683ff010
> __alloc_pages_nodemask+0x521/0x5f0 mm/page_alloc.c:4555
> __alloc_pages include/linux/gfp.h:473 [inline]
> __alloc_pages_node include/linux/gfp.h:486 [inline]
> kmem_getpages mm/slab.c:1398 [inline]
> cache_grow_begin+0x95/0x300 mm/slab.c:2666
> fallback_alloc+0x1ce/0x270 mm/slab.c:3208
> __do_cache_alloc mm/slab.c:3345 [inline]
> slab_alloc mm/slab.c:3373 [inline]
> kmem_cache_alloc+0x286/0x2f0 mm/slab.c:3541
> create_object+0x83/0x880 mm/kmemleak.c:578
> kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
> slab_post_alloc_hook mm/slab.h:442 [inline]
> slab_alloc mm/slab.c:3381 [inline]
> kmem_cache_alloc+0x18f/0x2f0 mm/slab.c:3541
> mempool_alloc+0x13e/0x340 mm/mempool.c:385
> bio_alloc_bioset+0x36f/0x5d0 block/bio.c:489
> bio_alloc include/linux/bio.h:393 [inline]
> submit_bh_wbc.isra.57+0x128/0x680 fs/buffer.c:3061
> __block_write_full_page+0x6e8/0xcd0 fs/buffer.c:1765
> block_write_full_page+0x202/0x250 fs/buffer.c:2955
> pageout mm/vmscan.c:865 [inline]
> shrink_page_list+0x220f/0x3800 mm/vmscan.c:1383
> shrink_inactive_list+0x3c2/0xaa0 mm/vmscan.c:1961
> shrink_list mm/vmscan.c:2273 [inline]
> shrink_node_memcg.constprop.83+0x4bf/0x10e0 mm/vmscan.c:2538
> shrink_node+0x162/0xd10 mm/vmscan.c:2753
> kswapd_shrink_node mm/vmscan.c:3516 [inline]
> balance_pgdat+0x47f/0xc00 mm/vmscan.c:3674
> kswapd+0x57c/0xde0 mm/vmscan.c:3929
> kthread+0x347/0x410 kernel/kthread.c:246
> ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> Syzkaller reproducer:
> # {Threaded:false Collide:false Repeat:true RepeatTimes:0 Procs:8
> Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:false
> UseTmpDir:true EnableCgroups:false EnableNetdev:true ResetNet:false
> HandleSegv:false Repro:false Trace:false}
> unshare(0x40000000)
>
>
> C reproducer:
> // autogenerated by syzkaller (https://github.com/google/syzkaller)
>
> #define _GNU_SOURCE
>
> #include <arpa/inet.h>
> #include <dirent.h>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <net/if.h>
> #include <net/if_arp.h>
> #include <netinet/in.h>
> #include <sched.h>
> #include <signal.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/ioctl.h>
> #include <sys/mount.h>
> #include <sys/prctl.h>
> #include <sys/resource.h>
> #include <sys/socket.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <sys/uio.h>
> #include <sys/wait.h>
> #include <time.h>
> #include <unistd.h>
>
> #include <linux/if_addr.h>
> #include <linux/if_ether.h>
> #include <linux/if_link.h>
> #include <linux/if_tun.h>
> #include <linux/in6.h>
> #include <linux/ip.h>
> #include <linux/neighbour.h>
> #include <linux/net.h>
> #include <linux/netlink.h>
> #include <linux/rtnetlink.h>
> #include <linux/tcp.h>
> #include <linux/veth.h>
>
> unsigned long long procid;
>
> static void sleep_ms(uint64_t ms)
> {
> usleep(ms * 1000);
> }
>
> static uint64_t current_time_ms(void)
> {
> struct timespec ts;
> if (clock_gettime(CLOCK_MONOTONIC, &ts))
> exit(1);
> return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
>
> static void use_temporary_dir(void)
> {
> char tmpdir_template[] = "./syzkaller.XXXXXX";
> char* tmpdir = mkdtemp(tmpdir_template);
> if (!tmpdir)
> exit(1);
> if (chmod(tmpdir, 0777))
> exit(1);
> if (chdir(tmpdir))
> exit(1);
> }
>
> static bool write_file(const char* file, const char* what, ...)
> {
> char buf[1024];
> va_list args;
> va_start(args, what);
> vsnprintf(buf, sizeof(buf), what, args);
> va_end(args);
> buf[sizeof(buf) - 1] = 0;
> int len = strlen(buf);
> int fd = open(file, O_WRONLY | O_CLOEXEC);
> if (fd == -1)
> return false;
> if (write(fd, buf, len) != len) {
> int err = errno;
> close(fd);
> errno = err;
> return false;
> }
> close(fd);
> return true;
> }
>
> static struct {
> char* pos;
> int nesting;
> struct nlattr* nested[8];
> char buf[1024];
> } nlmsg;
>
> static void netlink_init(int typ, int flags, const void* data, int size)
> {
> memset(&nlmsg, 0, sizeof(nlmsg));
> struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg.buf;
> hdr->nlmsg_type = typ;
> hdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
> memcpy(hdr + 1, data, size);
> nlmsg.pos = (char*)(hdr + 1) + NLMSG_ALIGN(size);
> }
>
> static void netlink_attr(int typ, const void* data, int size)
> {
> struct nlattr* attr = (struct nlattr*)nlmsg.pos;
> attr->nla_len = sizeof(*attr) + size;
> attr->nla_type = typ;
> memcpy(attr + 1, data, size);
> nlmsg.pos += NLMSG_ALIGN(attr->nla_len);
> }
>
> static void netlink_nest(int typ)
> {
> struct nlattr* attr = (struct nlattr*)nlmsg.pos;
> attr->nla_type = typ;
> nlmsg.pos += sizeof(*attr);
> nlmsg.nested[nlmsg.nesting++] = attr;
> }
>
> static void netlink_done(void)
> {
> struct nlattr* attr = nlmsg.nested[--nlmsg.nesting];
> attr->nla_len = nlmsg.pos - (char*)attr;
> }
>
> static int netlink_send(int sock)
> {
> if (nlmsg.pos > nlmsg.buf + sizeof(nlmsg.buf) || nlmsg.nesting)
> exit(1);
> struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg.buf;
> hdr->nlmsg_len = nlmsg.pos - nlmsg.buf;
> struct sockaddr_nl addr;
> memset(&addr, 0, sizeof(addr));
> addr.nl_family = AF_NETLINK;
> unsigned n = sendto(sock, nlmsg.buf, hdr->nlmsg_len, 0,
> (struct sockaddr*)&addr, sizeof(addr));
> if (n != hdr->nlmsg_len)
> exit(1);
> n = recv(sock, nlmsg.buf, sizeof(nlmsg.buf), 0);
> if (n < sizeof(struct nlmsghdr) + sizeof(struct nlmsgerr))
> exit(1);
> if (hdr->nlmsg_type != NLMSG_ERROR)
> exit(1);
> return -((struct nlmsgerr*)(hdr + 1))->error;
> }
>
> static void netlink_add_device_impl(const char* type, const char* name)
> {
> struct ifinfomsg hdr;
> memset(&hdr, 0, sizeof(hdr));
> netlink_init(RTM_NEWLINK, NLM_F_EXCL | NLM_F_CREATE, &hdr, sizeof(hdr));
> if (name)
> netlink_attr(IFLA_IFNAME, name, strlen(name));
> netlink_nest(IFLA_LINKINFO);
> netlink_attr(IFLA_INFO_KIND, type, strlen(type));
> }
>
> static void netlink_add_device(int sock, const char* type, const char* name)
> {
> netlink_add_device_impl(type, name);
> netlink_done();
> int err = netlink_send(sock);
> (void)err;
> }
>
> static void netlink_add_veth(int sock, const char* name, const char* peer)
> {
> netlink_add_device_impl("veth", name);
> netlink_nest(IFLA_INFO_DATA);
> netlink_nest(VETH_INFO_PEER);
> nlmsg.pos += sizeof(struct ifinfomsg);
> netlink_attr(IFLA_IFNAME, peer, strlen(peer));
> netlink_done();
> netlink_done();
> netlink_done();
> int err = netlink_send(sock);
> (void)err;
> }
>
> static void netlink_add_hsr(int sock, const char* name, const char* slave1,
> const char* slave2)
> {
> netlink_add_device_impl("hsr", name);
> netlink_nest(IFLA_INFO_DATA);
> int ifindex1 = if_nametoindex(slave1);
> netlink_attr(IFLA_HSR_SLAVE1, &ifindex1, sizeof(ifindex1));
> int ifindex2 = if_nametoindex(slave2);
> netlink_attr(IFLA_HSR_SLAVE2, &ifindex2, sizeof(ifindex2));
> netlink_done();
> netlink_done();
> int err = netlink_send(sock);
> (void)err;
> }
>
> static void netlink_device_change(int sock, const char* name, bool up,
> const char* master, const void* mac,
> int macsize)
> {
> struct ifinfomsg hdr;
> memset(&hdr, 0, sizeof(hdr));
> if (up)
> hdr.ifi_flags = hdr.ifi_change = IFF_UP;
> netlink_init(RTM_NEWLINK, 0, &hdr, sizeof(hdr));
> netlink_attr(IFLA_IFNAME, name, strlen(name));
> if (master) {
> int ifindex = if_nametoindex(master);
> netlink_attr(IFLA_MASTER, &ifindex, sizeof(ifindex));
> }
> if (macsize)
> netlink_attr(IFLA_ADDRESS, mac, macsize);
> int err = netlink_send(sock);
> (void)err;
> }
>
> static int netlink_add_addr(int sock, const char* dev, const void* addr,
> int addrsize)
> {
> struct ifaddrmsg hdr;
> memset(&hdr, 0, sizeof(hdr));
> hdr.ifa_family = addrsize == 4 ? AF_INET : AF_INET6;
> hdr.ifa_prefixlen = addrsize == 4 ? 24 : 120;
> hdr.ifa_scope = RT_SCOPE_UNIVERSE;
> hdr.ifa_index = if_nametoindex(dev);
> netlink_init(RTM_NEWADDR, NLM_F_CREATE | NLM_F_REPLACE, &hdr,
> sizeof(hdr));
> netlink_attr(IFA_LOCAL, addr, addrsize);
> netlink_attr(IFA_ADDRESS, addr, addrsize);
> return netlink_send(sock);
> }
>
> static void netlink_add_addr4(int sock, const char* dev, const char* addr)
> {
> struct in_addr in_addr;
> inet_pton(AF_INET, addr, &in_addr);
> int err = netlink_add_addr(sock, dev, &in_addr, sizeof(in_addr));
> (void)err;
> }
>
> static void netlink_add_addr6(int sock, const char* dev, const char* addr)
> {
> struct in6_addr in6_addr;
> inet_pton(AF_INET6, addr, &in6_addr);
> int err = netlink_add_addr(sock, dev, &in6_addr, sizeof(in6_addr));
> (void)err;
> }
>
> #define DEV_IPV4 "172.20.20.%d"
> #define DEV_IPV6 "fe80::%02hx"
> #define DEV_MAC 0x00aaaaaaaaaa
> static void initialize_netdevices(void)
> {
> char netdevsim[16];
> sprintf(netdevsim, "netdevsim%d", (int)procid);
> struct {
> const char* type;
> const char* dev;
> } devtypes[] = {
> {"ip6gretap", "ip6gretap0"}, {"bridge", "bridge0"},
> {"vcan", "vcan0"}, {"bond", "bond0"},
> {"team", "team0"}, {"dummy", "dummy0"},
> {"nlmon", "nlmon0"}, {"caif", "caif0"},
> {"batadv", "batadv0"}, {"vxcan", "vxcan1"},
> {"netdevsim", netdevsim}, {"veth", 0},
> };
> const char* devmasters[] = {"bridge", "bond", "team"};
> struct {
> const char* name;
> int macsize;
> bool noipv6;
> } devices[] = {
> {"lo", ETH_ALEN},
> {"sit0", 0},
> {"bridge0", ETH_ALEN},
> {"vcan0", 0, true},
> {"tunl0", 0},
> {"gre0", 0},
> {"gretap0", ETH_ALEN},
> {"ip_vti0", 0},
> {"ip6_vti0", 0},
> {"ip6tnl0", 0},
> {"ip6gre0", 0},
> {"ip6gretap0", ETH_ALEN},
> {"erspan0", ETH_ALEN},
> {"bond0", ETH_ALEN},
> {"veth0", ETH_ALEN},
> {"veth1", ETH_ALEN},
> {"team0", ETH_ALEN},
> {"veth0_to_bridge", ETH_ALEN},
> {"veth1_to_bridge", ETH_ALEN},
> {"veth0_to_bond", ETH_ALEN},
> {"veth1_to_bond", ETH_ALEN},
> {"veth0_to_team", ETH_ALEN},
> {"veth1_to_team", ETH_ALEN},
> {"veth0_to_hsr", ETH_ALEN},
> {"veth1_to_hsr", ETH_ALEN},
> {"hsr0", 0},
> {"dummy0", ETH_ALEN},
> {"nlmon0", 0},
> {"vxcan1", 0, true},
> {"caif0", ETH_ALEN},
> {"batadv0", ETH_ALEN},
> {netdevsim, ETH_ALEN},
> };
> int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
> if (sock == -1)
> exit(1);
> unsigned i;
> for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++)
> netlink_add_device(sock, devtypes[i].type, devtypes[i].dev);
> for (i = 0; i < sizeof(devmasters) / (sizeof(devmasters[0])); i++) {
> char master[32], slave0[32], veth0[32], slave1[32], veth1[32];
> sprintf(slave0, "%s_slave_0", devmasters[i]);
> sprintf(veth0, "veth0_to_%s", devmasters[i]);
> netlink_add_veth(sock, slave0, veth0);
> sprintf(slave1, "%s_slave_1", devmasters[i]);
> sprintf(veth1, "veth1_to_%s", devmasters[i]);
> netlink_add_veth(sock, slave1, veth1);
> sprintf(master, "%s0", devmasters[i]);
> netlink_device_change(sock, slave0, false, master, 0, 0);
> netlink_device_change(sock, slave1, false, master, 0, 0);
> }
> netlink_device_change(sock, "bridge_slave_0", true, 0, 0, 0);
> netlink_device_change(sock, "bridge_slave_1", true, 0, 0, 0);
> netlink_add_veth(sock, "hsr_slave_0", "veth0_to_hsr");
> netlink_add_veth(sock, "hsr_slave_1", "veth1_to_hsr");
> netlink_add_hsr(sock, "hsr0", "hsr_slave_0", "hsr_slave_1");
> netlink_device_change(sock, "hsr_slave_0", true, 0, 0, 0);
> netlink_device_change(sock, "hsr_slave_1", true, 0, 0, 0);
> for (i = 0; i < sizeof(devices) / (sizeof(devices[0])); i++) {
> char addr[32];
> sprintf(addr, DEV_IPV4, i + 10);
> netlink_add_addr4(sock, devices[i].name, addr);
> if (!devices[i].noipv6) {
> sprintf(addr, DEV_IPV6, i + 10);
> netlink_add_addr6(sock, devices[i].name, addr);
> }
> uint64_t macaddr = DEV_MAC + ((i + 10ull) << 40);
> netlink_device_change(sock, devices[i].name, true, 0, &macaddr,
> devices[i].macsize);
> }
> close(sock);
> }
> static void initialize_netdevices_init(void)
> {
> int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
> if (sock == -1)
> exit(1);
> struct {
> const char* type;
> int macsize;
> bool noipv6;
> bool noup;
> } devtypes[] = {
> {"nr", 7, true}, {"rose", 5, true, true},
> };
> unsigned i;
> for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++) {
> char dev[32], addr[32];
> sprintf(dev, "%s%d", devtypes[i].type, (int)procid);
> sprintf(addr, "172.30.%d.%d", i, (int)procid + 1);
> netlink_add_addr4(sock, dev, addr);
> if (!devtypes[i].noipv6) {
> sprintf(addr, "fe88::%02hx:%02hx", i, (int)procid + 1);
> netlink_add_addr6(sock, dev, addr);
> }
> int macsize = devtypes[i].macsize;
> uint64_t macaddr = 0xbbbbbb +
> ((unsigned long long)i << (8 * (macsize - 2))) +
> (procid << (8 * (macsize - 1)));
> netlink_device_change(sock, dev, !devtypes[i].noup, 0, &macaddr,
> macsize);
> }
> close(sock);
> }
>
> static void setup_common()
> {
> if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {
> }
> }
>
> static void loop();
>
> static void sandbox_common()
> {
> prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
> setpgrp();
> setsid();
> struct rlimit rlim;
> rlim.rlim_cur = rlim.rlim_max = 200 << 20;
> setrlimit(RLIMIT_AS, &rlim);
> rlim.rlim_cur = rlim.rlim_max = 32 << 20;
> setrlimit(RLIMIT_MEMLOCK, &rlim);
> rlim.rlim_cur = rlim.rlim_max = 136 << 20;
> setrlimit(RLIMIT_FSIZE, &rlim);
> rlim.rlim_cur = rlim.rlim_max = 1 << 20;
> setrlimit(RLIMIT_STACK, &rlim);
> rlim.rlim_cur = rlim.rlim_max = 0;
> setrlimit(RLIMIT_CORE, &rlim);
> rlim.rlim_cur = rlim.rlim_max = 256;
> setrlimit(RLIMIT_NOFILE, &rlim);
> if (unshare(CLONE_NEWNS)) {
> }
> if (unshare(CLONE_NEWIPC)) {
> }
> if (unshare(0x02000000)) {
> }
> if (unshare(CLONE_NEWUTS)) {
> }
> if (unshare(CLONE_SYSVSEM)) {
> }
> typedef struct {
> const char* name;
> const char* value;
> } sysctl_t;
> static const sysctl_t sysctls[] = {
> {"/proc/sys/kernel/shmmax", "16777216"},
> {"/proc/sys/kernel/shmall", "536870912"},
> {"/proc/sys/kernel/shmmni", "1024"},
> {"/proc/sys/kernel/msgmax", "8192"},
> {"/proc/sys/kernel/msgmni", "1024"},
> {"/proc/sys/kernel/msgmnb", "1024"},
> {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},
> };
> unsigned i;
> for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
> write_file(sysctls[i].name, sysctls[i].value);
> }
>
> int wait_for_loop(int pid)
> {
> if (pid < 0)
> exit(1);
> int status = 0;
> while (waitpid(-1, &status, __WALL) != pid) {
> }
> return WEXITSTATUS(status);
> }
>
> static int do_sandbox_none(void)
> {
> if (unshare(CLONE_NEWPID)) {
> }
> int pid = fork();
> if (pid != 0)
> return wait_for_loop(pid);
> setup_common();
> sandbox_common();
> initialize_netdevices_init();
> if (unshare(CLONE_NEWNET)) {
> }
> initialize_netdevices();
> loop();
> exit(1);
> }
>
> #define FS_IOC_SETFLAGS _IOW('f', 2, long)
> static void remove_dir(const char* dir)
> {
> DIR* dp;
> struct dirent* ep;
> int iter = 0;
> retry:
> while (umount2(dir, MNT_DETACH) == 0) {
> }
> dp = opendir(dir);
> if (dp == NULL) {
> if (errno == EMFILE) {
> exit(1);
> }
> exit(1);
> }
> while ((ep = readdir(dp))) {
> if (strcmp(ep->d_name, ".") == 0 || strcmp(ep->d_name, "..") == 0)
> continue;
> char filename[FILENAME_MAX];
> snprintf(filename, sizeof(filename), "%s/%s", dir, ep->d_name);
> while (umount2(filename, MNT_DETACH) == 0) {
> }
> struct stat st;
> if (lstat(filename, &st))
> exit(1);
> if (S_ISDIR(st.st_mode)) {
> remove_dir(filename);
> continue;
> }
> int i;
> for (i = 0;; i++) {
> if (unlink(filename) == 0)
> break;
> if (errno == EPERM) {
> int fd = open(filename, O_RDONLY);
> if (fd != -1) {
> long flags = 0;
> if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0)
> close(fd);
> continue;
> }
> }
> if (errno == EROFS) {
> break;
> }
> if (errno != EBUSY || i > 100)
> exit(1);
> if (umount2(filename, MNT_DETACH))
> exit(1);
> }
> }
> closedir(dp);
> int i;
> for (i = 0;; i++) {
> if (rmdir(dir) == 0)
> break;
> if (i < 100) {
> if (errno == EPERM) {
> int fd = open(dir, O_RDONLY);
> if (fd != -1) {
> long flags = 0;
> if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0)
> close(fd);
> continue;
> }
> }
> if (errno == EROFS) {
> break;
> }
> if (errno == EBUSY) {
> if (umount2(dir, MNT_DETACH))
> exit(1);
> continue;
> }
> if (errno == ENOTEMPTY) {
> if (iter < 100) {
> iter++;
> goto retry;
> }
> }
> }
> exit(1);
> }
> }
>
> static void kill_and_wait(int pid, int* status)
> {
> kill(-pid, SIGKILL);
> kill(pid, SIGKILL);
> int i;
> for (i = 0; i < 100; i++) {
> if (waitpid(-1, status, WNOHANG | __WALL) == pid)
> return;
> usleep(1000);
> }
> DIR* dir = opendir("/sys/fs/fuse/connections");
> if (dir) {
> for (;;) {
> struct dirent* ent = readdir(dir);
> if (!ent)
> break;
> if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
> continue;
> char abort[300];
> snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
> ent->d_name);
> int fd = open(abort, O_WRONLY);
> if (fd == -1) {
> continue;
> }
> if (write(fd, abort, 1) < 0) {
> }
> close(fd);
> }
> closedir(dir);
> } else {
> }
> while (waitpid(-1, status, __WALL) != pid) {
> }
> }
>
> #define SYZ_HAVE_SETUP_TEST 1
> static void setup_test()
> {
> prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
> setpgrp();
> }
>
> #define SYZ_HAVE_RESET_TEST 1
> static void reset_test()
> {
> int fd;
> for (fd = 3; fd < 30; fd++)
> close(fd);
> }
>
> static void execute_one(void);
>
> #define WAIT_FLAGS __WALL
>
> static void loop(void)
> {
> int iter;
> for (iter = 0;; iter++) {
> char cwdbuf[32];
> sprintf(cwdbuf, "./%d", iter);
> if (mkdir(cwdbuf, 0777))
> exit(1);
> int pid = fork();
> if (pid < 0)
> exit(1);
> if (pid == 0) {
> if (chdir(cwdbuf))
> exit(1);
> setup_test();
> execute_one();
> reset_test();
> exit(0);
> }
> int status = 0;
> uint64_t start = current_time_ms();
> for (;;) {
> if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
> break;
> sleep_ms(1);
> if (current_time_ms() - start < 5 * 1000)
> continue;
> kill_and_wait(pid, &status);
> break;
> }
> remove_dir(cwdbuf);
> }
> }
>
> void execute_one(void)
> {
> syscall(__NR_unshare, 0x40000000);
> }
> int main(void)
> {
> syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
> for (procid = 0; procid < 8; procid++) {
> if (fork() == 0) {
> use_temporary_dir();
> do_sandbox_none();
> }
> }
> sleep(1000000);
> return 0;
> }
>
>
> I reviewed kernel code and found a bug that
> net_drop_ns func doesn't call net_free func when refcount_dec_and_test's
> return value is zero.
Yes. We don't call net_free when the reference count does not decrement
to zero. The reference count is initialized to 1 a few lines above the
section of code in your patch so that should not be a problem.
> or
> when rv = down_read_killable(&pernet_ops_rwsem) < 0, it doesn't need to
> call refcount_dec_and_test.
It doesn't need to but it should be harmless.
> https://github.com/torvalds/linux/commit/5ba049a5cc8e24a1643df75bbf65b4efa070fa74#diff-9312644e2968a45510bacdd2b2872ad2
> (I can't reproduce this bug on v4.15 , and
> 1bdbe227492075d058e37cb3d400e6468d0095b5 with my patch. Because of the
> previous version of kernel doesn't have this bug.)
> This bug can lead to memory leak or DOS.
>
> I made a patch for this bug. (just revert to a before commit)
What am I missing?
The only thing I can see your patch doing is covering up a memory stomp
that has the effect of changing the value of net->passive. I am not
really keen on hiding bugs of that kind.
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index b02fb19df2cc..9de0ade14956 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -431,15 +431,18 @@ struct net *copy_net_ns(unsigned long flags,
> get_user_ns(user_ns);
>
> rv = down_read_killable(&pernet_ops_rwsem);
> - if (rv < 0)
> - goto put_userns;
> + if (rv < 0){
> + net_free(net);
> + dec_net_namespaces(ucounts);
> + put_user_ns(user_ns);
> + return ERR_PTR(rv);
> + }
>
> rv = setup_net(net, user_ns);
>
> up_read(&pernet_ops_rwsem);
>
> if (rv < 0) {
> -put_userns:
> put_user_ns(user_ns);
> net_drop_ns(net);
> dec_ucounts:
>
> and, sorry for my encrypted mails.
Eric
next prev parent reply other threads:[~2019-01-11 20:34 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-11 18:07 net/core: BUG in copy_net_ns() zzoru
2019-01-11 20:33 ` Eric W. Biederman [this message]
2019-01-11 20:41 ` Kirill Tkhai
2019-01-11 23:31 ` zzoru
2019-01-11 23:50 ` Eric W. Biederman
[not found] ` <CALRZ7Utk6NCGRN6mZQnF1v1a=cTWt1-JzRjLdiD14FTQC=fysg@mail.gmail.com>
2019-01-14 11:58 ` Dmitry Vyukov
2019-01-14 18:12 ` Eric W. Biederman
[not found] ` <CALRZ7UvMbrwLb7UVgcVa9+z5yqVfJ6taj2tzpsFhWU1Cdw2J1A@mail.gmail.com>
2019-01-14 18:29 ` Eric W. Biederman
2019-01-15 10:36 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fttzaq8k.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=avagin@virtuozzo.com \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=syzkaller@googlegroups.com \
--cc=tyhicks@canonical.com \
--cc=zzoru007@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.