stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Serge Hallyn <serge.hallyn@canonical.com>,
	Serge Hallyn <serge.hallyn@ubuntu.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: [ 096/117] pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
Date: Tue, 24 Sep 2013 17:19:22 -0700	[thread overview]
Message-ID: <20130925001751.404690496@linuxfoundation.org> (raw)
In-Reply-To: <20130925001740.833541979@linuxfoundation.org>

3.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Eric W. Biederman" <ebiederm@xmission.com>

commit a606488513543312805fab2b93070cefe6a3016c upstream.

Serge Hallyn <serge.hallyn@ubuntu.com> writes:

> Since commit af4b8a83add95ef40716401395b44a1b579965f4 it's been
> possible to get into a situation where a pidns reaper is
> <defunct>, reparented to host pid 1, but never reaped.  How to
> reproduce this is documented at
>
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526
> (and see
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/comments/13)
> In short, run repeated starts of a container whose init is
>
> Process.exit(0);
>
> sysrq-t when such a task is playing zombie shows:
>
> [  131.132978] init            x ffff88011fc14580     0  2084   2039 0x00000000
> [  131.132978]  ffff880116e89ea8 0000000000000002 ffff880116e89fd8 0000000000014580
> [  131.132978]  ffff880116e89fd8 0000000000014580 ffff8801172a0000 ffff8801172a0000
> [  131.132978]  ffff8801172a0630 ffff88011729fff0 ffff880116e14650 ffff88011729fff0
> [  131.132978] Call Trace:
> [  131.132978]  [<ffffffff816f6159>] schedule+0x29/0x70
> [  131.132978]  [<ffffffff81064591>] do_exit+0x6e1/0xa40
> [  131.132978]  [<ffffffff81071eae>] ? signal_wake_up_state+0x1e/0x30
> [  131.132978]  [<ffffffff8106496f>] do_group_exit+0x3f/0xa0
> [  131.132978]  [<ffffffff810649e4>] SyS_exit_group+0x14/0x20
> [  131.132978]  [<ffffffff8170102f>] tracesys+0xe1/0xe6
>
> Further debugging showed that every time this happened, zap_pid_ns_processes()
> started with nr_hashed being 3, while we were expecting it to drop to 2.
> Any time it didn't happen, nr_hashed was 1 or 2.  So the reaper was
> waiting for nr_hashed to become 2, but free_pid() only wakes the reaper
> if nr_hashed hits 1.

The issue is that when the task group leader of an init process exits
before other tasks of the init process when the init process finally
exits it will be a secondary task sleeping in zap_pid_ns_processes and
waiting to wake up when the number of hashed pids drops to two.  This
case waits forever as free_pid only sends a wake up when the number of
hashed pids drops to 1.

To correct this the simple strategy of sending a possibly unncessary
wake up when the number of hashed pids drops to 2 is adopted.

Sending one extraneous wake up is relatively harmless, at worst we
waste a little cpu time in the rare case when a pid namespace
appropaches exiting.

We can detect the case when the pid namespace drops to just two pids
hashed race free in free_pid.

Dereferencing pid_ns->child_reaper with the pidmap_lock held is safe
without out the tasklist_lock because it is guaranteed that the
detach_pid will be called on the child_reaper before it is freed and
detach_pid calls __change_pid which calls free_pid which takes the
pidmap_lock.  __change_pid only calls free_pid if this is the
last use of the pid.  For a thread that is not the thread group leader
the threads pid will only ever have one user because a threads pid
is not allowed to be the pid of a process, of a process group or
a session.  For a thread that is a thread group leader all of
the other threads of that process will be reaped before it is allowed
for the thread group leader to be reaped ensuring there will only
be one user of the threads pid as a process pid.  Furthermore
because the thread is the init process of a pid namespace all of the
other processes in the pid namespace will have also been already freed
leading to the fact that the pid will not be used as a session pid or
a process group pid for any other running process.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/pid.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -265,6 +265,7 @@ void free_pid(struct pid *pid)
 		struct pid_namespace *ns = upid->ns;
 		hlist_del_rcu(&upid->pid_chain);
 		switch(--ns->nr_hashed) {
+		case 2:
 		case 1:
 			/* When all that is left in the pid namespace
 			 * is the reaper wake up the reaper.  The reaper



  parent reply	other threads:[~2013-09-25  0:19 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-25  0:17 [ 000/117] 3.11.2-stable review Greg Kroah-Hartman
2013-09-25  0:17 ` [ 001/117] ARC: SMP failed to boot due to missing IVT setup Greg Kroah-Hartman
2013-09-25  0:17 ` [ 002/117] ipc/msg.c: Fix lost wakeup in msgsnd() Greg Kroah-Hartman
2013-09-25  0:17 ` [ 003/117] xtensa: Fix broken allmodconfig build Greg Kroah-Hartman
2013-09-25  0:17 ` [ 004/117] SCSI: Allow MPT Fusion SAS 3.0 driver to be built into the kernel Greg Kroah-Hartman
2013-09-25  0:17 ` [ 005/117] drm/i915: make user mode sync polarity setting explicit Greg Kroah-Hartman
2013-10-02 17:30   ` Sven Joachim
2013-10-02 21:11     ` Daniel Vetter
2013-10-02 21:24       ` Greg Kroah-Hartman
2013-09-25  0:17 ` [ 006/117] firmware loader: fix pending_fw_head list corruption Greg Kroah-Hartman
2013-09-25  0:17 ` [ 007/117] zram: fix invalid memory access Greg Kroah-Hartman
2013-09-25  0:17 ` [ 008/117] zram: dont grab mutex in zram_slot_free_noity Greg Kroah-Hartman
2013-09-25  0:17 ` [ 009/117] UBI: Fix PEB leak in wear_leveling_worker() Greg Kroah-Hartman
2013-09-25  0:17 ` [ 010/117] SCSI: sd: Fix potential out-of-bounds access Greg Kroah-Hartman
2013-09-25  0:17 ` [ 011/117] crypto: api - Fix race condition in larval lookup Greg Kroah-Hartman
2013-09-25  0:17 ` [ 012/117] s390/bpf,jit: fix address randomization Greg Kroah-Hartman
2013-09-25  0:17 ` [ 013/117] powerpc: Handle unaligned ldbrx/stdbrx Greg Kroah-Hartman
2013-09-25  0:18 ` [ 014/117] powerpc: Default arch idle could cede processor on pseries Greg Kroah-Hartman
2013-09-25  0:18 ` [ 015/117] xen-gnt: prevent adding duplicate gnt callbacks Greg Kroah-Hartman
2013-09-25  0:18 ` [ 016/117] ARM: xen: only set pm function ptrs for Xen guests Greg Kroah-Hartman
2013-09-25  0:18 ` [ 017/117] cpuidle: coupled: abort idle if pokes are pending Greg Kroah-Hartman
2013-09-25  0:18 ` [ 018/117] cpuidle: coupled: fix race condition between pokes and safe state Greg Kroah-Hartman
2013-09-25  0:18 ` [ 019/117] arm64: perf: fix group validation when using enable_on_exec Greg Kroah-Hartman
2013-09-25  0:18 ` [ 020/117] arm64: perf: fix ARMv8 EVTYPE_MASK to include NSH bit Greg Kroah-Hartman
2013-09-25  0:18 ` [ 021/117] ARM: PCI: versatile: Fix map_irq function to match hardware Greg Kroah-Hartman
2013-09-25  0:18 ` [ 022/117] ARM: PCI: versatile: Fix PCI I/O Greg Kroah-Hartman
2013-09-25  0:18 ` [ 023/117] ARM: PCI: versatile: Fix SMAP register offsets Greg Kroah-Hartman
2013-09-25  0:18 ` [ 024/117] KVM: PPC: Book3S: Fix compile error in XICS emulation Greg Kroah-Hartman
2013-09-25  0:18 ` [ 025/117] xhci-plat: Dont enable legacy PCI interrupts Greg Kroah-Hartman
2013-09-25  0:18 ` [ 026/117] usb: xhci: Disable runtime PM suspend for quirky controllers Greg Kroah-Hartman
2013-09-25  0:18 ` [ 027/117] xhci: fix port BESL LPM capability checking Greg Kroah-Hartman
2013-09-25  0:18 ` [ 028/117] usb: dwc3: gadget: dont request IRQs in atomic Greg Kroah-Hartman
2013-09-25  0:18 ` [ 029/117] tty: disassociate_ctty() sends the extra SIGCONT Greg Kroah-Hartman
2013-09-25  0:18 ` [ 030/117] cifs: ensure that srv_mutex is held when dealing with ssocket pointer Greg Kroah-Hartman
2013-09-25  0:18 ` [ 031/117] CIFS: Fix a memory leak when a lease break comes Greg Kroah-Hartman
2013-09-25  0:18 ` [ 032/117] CIFS: Fix missing lease break Greg Kroah-Hartman
2013-09-25  0:18 ` [ 033/117] USB: OHCI: Allow runtime PM without system sleep Greg Kroah-Hartman
2013-09-25  0:18 ` [ 034/117] regmap: debugfs: Fix continued read from registers file Greg Kroah-Hartman
2013-09-25  0:18 ` [ 035/117] staging: comedi: dt282x: dt282x_ai_insn_read() always fails Greg Kroah-Hartman
2013-09-25  0:18 ` [ 036/117] PCI/ACPI: Fix _OSC ordering to allow PCIe hotplug use when available Greg Kroah-Hartman
2013-09-25  0:18 ` [ 037/117] ACPI / LPSS: dont crash if a device has no MMIO resources Greg Kroah-Hartman
2013-09-25  0:18 ` [ 038/117] USB: mos7720: use GFP_ATOMIC under spinlock Greg Kroah-Hartman
2013-09-25  0:18 ` [ 039/117] USB: mos7720: fix big-endian control requests Greg Kroah-Hartman
2013-09-25  0:18 ` [ 040/117] usb: ehci-mxc: check for pdata before dereferencing Greg Kroah-Hartman
2013-09-25  0:18 ` [ 041/117] USB: cdc-wdm: fix race between interrupt handler and tasklet Greg Kroah-Hartman
2013-09-25  0:18 ` [ 042/117] usb: gadget: uvc: Fix error handling in uvc_queue_buffer() Greg Kroah-Hartman
2013-09-25  0:18 ` [ 043/117] usb: Dont fail port power resume on device disconnect Greg Kroah-Hartman
2013-09-25  0:18 ` [ 044/117] USB: fix build error when CONFIG_PM_SLEEP isnt enabled Greg Kroah-Hartman
2013-09-25  0:18 ` [ 045/117] usb: config->desc.bLength may not exceed amount of data returned by the device Greg Kroah-Hartman
2013-09-25  0:18 ` [ 046/117] USB: handle LPM errors during device suspend correctly Greg Kroah-Hartman
2013-09-25  0:18 ` [ 047/117] usb: dont check pm qos NO_POWER_OFF flag in usb_port_suspend() Greg Kroah-Hartman
2013-09-25  0:18 ` [ 048/117] rculist: list_first_or_null_rcu() should use list_entry_rcu() Greg Kroah-Hartman
2013-09-25  0:18 ` [ 049/117] ASoC: wm8960: Fix PLL register writes Greg Kroah-Hartman
2013-09-25  0:18 ` [ 050/117] ASoC: mc13783: add spi errata fix Greg Kroah-Hartman
2013-09-25  0:18 ` [ 051/117] x86, smap: Handle csum_partial_copy_*_user() Greg Kroah-Hartman
2013-09-25  0:18 ` [ 052/117] Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP Greg Kroah-Hartman
2013-09-25  0:18 ` [ 053/117] pci_ids: Add PCI device ID functions 3 and 4 for newer F15h models Greg Kroah-Hartman
2013-09-25  0:18 ` [ 054/117] x86, amd_nb: Clarify F15h, model 30h GART and L3 support Greg Kroah-Hartman
2013-09-25  0:18 ` [ 055/117] x86/mce: Pay no attention to F bit in MCACOD when parsing UC errors Greg Kroah-Hartman
2013-09-25  0:18 ` [ 056/117] sched/x86: Optimize switch_mm() for multi-threaded workloads Greg Kroah-Hartman
2013-09-25  0:18 ` [ 057/117] ALSA: hda - Re-setup HDMI pin and audio infoframe on stream switches Greg Kroah-Hartman
2013-09-25  0:18 ` [ 058/117] ALSA: hda - hdmi: Fallback to ALSA allocation when selecting CA Greg Kroah-Hartman
2013-09-25  0:18 ` [ 059/117] ALSA: hda - Add Toshiba Satellite C870 to MSI blacklist Greg Kroah-Hartman
2013-09-25  0:18 ` [ 060/117] pinctrl: at91: fix get_pullup/down function return Greg Kroah-Hartman
2013-09-25  0:18 ` [ 061/117] ext4: move test whether extent to map can be extended to one place Greg Kroah-Hartman
2013-09-25  0:18 ` [ 062/117] ext4: fix ext4_writepages() in presence of truncate Greg Kroah-Hartman
2013-09-29 23:07   ` Ben Hutchings
2013-09-29 23:52     ` Dave Jones
2013-09-30  9:23     ` Jan Kara
2013-09-25  0:18 ` [ 063/117] ext4: simplify truncation code in ext4_setattr() Greg Kroah-Hartman
2013-09-25  0:18 ` [ 064/117] mac80211: ignore (E)CSA in probe response frames Greg Kroah-Hartman
2013-09-25  0:18 ` [ 065/117] brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error() Greg Kroah-Hartman
2013-09-25  0:18 ` [ 066/117] ath9k: always clear ps filter bit on new assoc Greg Kroah-Hartman
2013-09-25  0:18 ` [ 067/117] ath9k: fix rx descriptor related race condition Greg Kroah-Hartman
2013-09-25  0:18 ` [ 068/117] ath9k: avoid accessing MRC registers on single-chain devices Greg Kroah-Hartman
2013-09-25  0:18 ` [ 069/117] net: mvneta: properly disable HW PHY polling and ensure adjust_link() works Greg Kroah-Hartman
2013-09-25  0:18 ` [ 070/117] HID: Correct the USB IDs for the new Macbook Air 6 Greg Kroah-Hartman
2013-09-25  0:18 ` [ 071/117] HID: pantherlord: validate output report details Greg Kroah-Hartman
2013-09-25  0:18 ` [ 072/117] HID: kye: Add report fixup for Genius Gx Imperator Keyboard Greg Kroah-Hartman
2013-09-25  0:18 ` [ 073/117] HID: wiimote: work around broken DRM_KAI on GEN10 Greg Kroah-Hartman
2013-09-25  0:19 ` [ 074/117] HID: Fix Speedlink VAD Cezanne support for some devices Greg Kroah-Hartman
2013-09-25  0:19 ` [ 075/117] HID: sensor-hub: validate feature report details Greg Kroah-Hartman
2013-09-25  0:19 ` [ 076/117] HID: validate HID report id size Greg Kroah-Hartman
2013-09-25  0:19 ` [ 077/117] HID: picolcd_core: validate output report details Greg Kroah-Hartman
2013-09-25  0:19 ` [ 078/117] HID: ntrig: validate feature " Greg Kroah-Hartman
2013-09-25  0:19 ` [ 079/117] HID: picolcd: Prevent NULL pointer dereference on _remove() Greg Kroah-Hartman
2013-09-25  0:19 ` [ 080/117] HID: battery: dont do DMA from stack Greg Kroah-Hartman
2013-09-25  0:19 ` [ 081/117] HID: hidraw: correctly deallocate memory on device disconnect Greg Kroah-Hartman
2013-09-25  0:19 ` [ 082/117] HID: check for NULL field when setting values Greg Kroah-Hartman
2013-09-25  0:19 ` [ 083/117] HID: usbhid: quirk for N-Trig DuoSense Touch Screen Greg Kroah-Hartman
2013-09-25  0:19 ` [ 084/117] media: exynos4-is: Fix fimc-lite bayer formats Greg Kroah-Hartman
2013-09-25  0:19 ` [ 085/117] media: exynos-gsc: Register v4l2 device Greg Kroah-Hartman
2013-09-25  0:19 ` [ 086/117] media: exynos4-is: Fix entity unregistration on error path Greg Kroah-Hartman
2013-09-25  0:19 ` [ 087/117] media: cx88: Fix regression: CX88_AUDIO_WM8775 cant be 0 Greg Kroah-Hartman
2013-09-25  0:19 ` [ 088/117] media: mb86a20s: Fix TS parallel mode Greg Kroah-Hartman
2013-09-25  0:19 ` [ 089/117] media: siano: fix divide error on 0 counters Greg Kroah-Hartman
2013-09-25  0:19 ` [ 090/117] Btrfs: dont allow the replace procedure on read only filesystems Greg Kroah-Hartman
2013-09-25  0:19 ` [ 091/117] uprobes: Fix utask->depth accounting in handle_trampoline() Greg Kroah-Hartman
2013-09-25  0:19 ` [ 092/117] leds: wm831x-status: Request a REG resource Greg Kroah-Hartman
2013-09-25  0:19 ` [ 093/117] MIPS: ath79: Fix ar933x watchdog clock Greg Kroah-Hartman
2013-09-25  0:19 ` [ 094/117] target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out Greg Kroah-Hartman
2013-09-25  0:19 ` [ 095/117] intel-iommu: Fix leaks in pagetable freeing Greg Kroah-Hartman
2013-09-25  0:19 ` Greg Kroah-Hartman [this message]
2013-09-25  0:19 ` [ 097/117] pidns: fix vfork() after unshare(CLONE_NEWPID) Greg Kroah-Hartman
2013-09-25  0:19 ` [ 098/117] ocfs2: fix the end cluster offset of FIEMAP Greg Kroah-Hartman
2013-09-25  0:19 ` [ 099/117] memcg: fix multiple large threshold notifications Greg Kroah-Hartman
2013-09-25  0:19 ` [ 100/117] mm/huge_memory.c: fix potential NULL pointer dereference Greg Kroah-Hartman
2013-09-25  0:19 ` [ 101/117] proc: Restrict mounting the proc filesystem Greg Kroah-Hartman
2013-09-25  0:19 ` [ 102/117] isofs: Refuse RW mount of the filesystem instead of making it RO Greg Kroah-Hartman
2013-09-25  0:19 ` [ 103/117] amd64_edac: Fix single-channel setups Greg Kroah-Hartman
2013-09-25  0:19 ` [ 104/117] drm/edid: add quirk for Medion MD30217PG Greg Kroah-Hartman
2013-09-25  0:19 ` [ 105/117] um: Implement probe_kernel_read() Greg Kroah-Hartman
2013-09-25  0:19 ` [ 106/117] libceph: unregister request in __map_request failed and nofail == false Greg Kroah-Hartman
2013-09-25  0:19 ` [ 107/117] libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc Greg Kroah-Hartman
2013-09-25  0:19 ` [ 108/117] ceph: Dont forget the up_read(&osdc->map_sem) if met error Greg Kroah-Hartman
2013-09-25  0:19 ` [ 109/117] rbd: fix I/O error propagation for reads Greg Kroah-Hartman
2013-09-25  0:19 ` [ 110/117] mmc: tmio_mmc_dma: fix PIO fallback on SDHI Greg Kroah-Hartman
2013-09-25  0:19 ` [ 111/117] of: Fix missing memory initialization on FDT unflattening Greg Kroah-Hartman
2013-09-25  0:19 ` [ 112/117] mtd: nand: fix NAND_BUSWIDTH_AUTO for x16 devices Greg Kroah-Hartman
2013-09-25  0:19 ` [ 113/117] clk: wm831x: Initialise wm831x pointer on init Greg Kroah-Hartman
2013-09-25  0:19 ` [ 114/117] fuse: postpone end_page_writeback() in fuse_writepage_locked() Greg Kroah-Hartman
2013-09-25  0:19 ` [ 115/117] fuse: invalidate inode attributes on xattr modification Greg Kroah-Hartman
2013-09-25  0:19 ` [ 116/117] fuse: hotfix truncate_pagecache() issue Greg Kroah-Hartman
2013-09-25  0:19 ` [ 117/117] fuse: readdir: check for slash in names Greg Kroah-Hartman
2013-09-25  4:09 ` [ 000/117] 3.11.2-stable review Guenter Roeck
2013-09-26  1:09   ` Greg Kroah-Hartman
2013-09-26  2:26 ` Shuah Khan
2013-09-26  2:45   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130925001751.404690496@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serge.hallyn@canonical.com \
    --cc=serge.hallyn@ubuntu.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).