* Re: Resume problem, reboot instead of resume after suspend on Zepto Nox A14
From: Damien Thébault @ 2011-11-02 20:47 UTC (permalink / raw)
To: MyungJoo Ham; +Cc: linux-pm
In-Reply-To: <CAJ0PZbT6_64Sg6NX+-Kshdn0T-1zynyUVkZ-hRCYtz6Fv_v4CA@mail.gmail.com>
On Wed, Nov 2, 2011 at 10:39, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
> [...]
> When core is used for pm_test, the platform_ops's enter() is not
> called. Thus, you are not actually testing the real suspend-to-RAM,
> you are testing everything suspend-related materials except for the
> real suspend-resume mechanism for the CPU.
>
> I'm not familiar with x86's ACPI; however, this looks like a
> resume-related problem.
> Probably, at the BIOS (or any other pre-BIOS stage?), the CPU cannot
> read the "the system has been suspended-to-RAM" message?
> or the resume routine of kernel is failing and the system resets or
> jumps to BIOS?
> , which are some issues we've experienced with ARM devices and its bootloaders.
>
> You'll probably need to examine the ACPI's suspend-to-RAM related bits
> or data structure at the platform_ops's enter() callback. And see if
> anything is corrupted or mis-configured.
I made some progress: with "nolapic" I'm able to get back to my
desktop, however if I make any fs access (like running an executable),
the system just locks. At least it doesn't reboot so it's an
improvement. I'll try so investigate further.
--
Damien Thebault
^ permalink raw reply
* Re: [PATCH 4/6 v2] PM: Limit race conditions between runtime PM and system sleep (v2)
From: Greg KH @ 2011-11-02 20:11 UTC (permalink / raw)
To: Linus Walleij
Cc: Kevin Hilman, linux-scsi, LKML, Jesse Barnes, Ulf Hansson,
Tejun Heo, Linux PM mailing list, stable
In-Reply-To: <CACRpkdY8zAw_pdwn2yUEJPdfYPS+VCcLrpucZzhCyNsVPYehdg@mail.gmail.com>
On Wed, Nov 02, 2011 at 09:06:02PM +0100, Linus Walleij wrote:
> On Wed, Nov 2, 2011 at 8:14 PM, Greg KH <greg@kroah.com> wrote:
>
> >> We have also backported:
> >> PM: Introduce generic "noirq" callback routines for subsystems (v2)
> >> PM / Runtime: Update documentation of interactions with system sleep
> >> PM / Runtime: Add new helper function: pm_runtime_status_suspended()
> >>
> >> And now it seems to be sufficient to get this thing going.
> >
> > So, what specific git commits do you want to see in the 3.0-stable
> > tree, and in what order should they be applied in?
>
> So to my untrained eye it looks like it should be applied like this (top to
> bottom) using the reverse commit order from the mainline kernel:
>
> e529192 PM: Introduce generic "noirq" callback routines for subsystems (v2)
> 455716e PM / Runtime: Update documentation of interactions with system sleep
> 1e2ef05 PM: Limit race conditions between runtime PM and system sleep (v2)
> f3393b6 PM / Runtime: Add new helper function: pm_runtime_status_suspended()
>
> So (2) documents the problem, (3) fixes it, whereas (1) and (4) makes
> it possible
> to write proper _noirq() code that does not race, IIRC.
But, after this, it's just adding new infrastructure that drivers will
then be able to use. As I'm not adding new drivers to 3.0, there will
not be any users of this code, so why add it in the first place? It
doesn't look like this follows the rules of the stable kernel tree at
all, does it?
confused,
greg k-h
^ permalink raw reply
* Re: [PATCH 4/6 v2] PM: Limit race conditions between runtime PM and system sleep (v2)
From: Linus Walleij @ 2011-11-02 20:06 UTC (permalink / raw)
To: Greg KH
Cc: Kevin Hilman, linux-scsi, LKML, Jesse Barnes, Ulf Hansson,
Tejun Heo, Linux PM mailing list, stable
In-Reply-To: <20111102191432.GC29355@kroah.com>
On Wed, Nov 2, 2011 at 8:14 PM, Greg KH <greg@kroah.com> wrote:
>> We have also backported:
>> PM: Introduce generic "noirq" callback routines for subsystems (v2)
>> PM / Runtime: Update documentation of interactions with system sleep
>> PM / Runtime: Add new helper function: pm_runtime_status_suspended()
>>
>> And now it seems to be sufficient to get this thing going.
>
> So, what specific git commits do you want to see in the 3.0-stable
> tree, and in what order should they be applied in?
So to my untrained eye it looks like it should be applied like this (top to
bottom) using the reverse commit order from the mainline kernel:
e529192 PM: Introduce generic "noirq" callback routines for subsystems (v2)
455716e PM / Runtime: Update documentation of interactions with system sleep
1e2ef05 PM: Limit race conditions between runtime PM and system sleep (v2)
f3393b6 PM / Runtime: Add new helper function: pm_runtime_status_suspended()
So (2) documents the problem, (3) fixes it, whereas (1) and (4) makes
it possible
to write proper _noirq() code that does not race, IIRC.
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH 4/6 v2] PM: Limit race conditions between runtime PM and system sleep (v2)
From: Greg KH @ 2011-11-02 19:14 UTC (permalink / raw)
To: Linus Walleij
Cc: Kevin Hilman, linux-scsi, LKML, Jesse Barnes, Tejun Heo,
Linux PM mailing list, stable
In-Reply-To: <CAKnu2Mru4c-2LBCZ_k3ZA+v47u0+um8igYU7EWt8=qOTVgcUjA@mail.gmail.com>
On Thu, Oct 27, 2011 at 03:54:20PM +0200, Linus Walleij wrote:
> 2011/6/29 Rafael J. Wysocki <rjw@sisk.pl>:
>
> > One of the roles of the PM core is to prevent different PM callbacks
> > executed for the same device object from racing with each other.
> > Unfortunately, after commit e8665002477f0278f84f898145b1f141ba26ee26
> > (PM: Allow pm_runtime_suspend() to succeed during system suspend)
> > runtime PM callbacks may be executed concurrently with system
> > suspend/resume callbacks for the same device.
> (...)
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
>
> A quick question: is there some specific reason why this patch should
> not go into the 3.0.y stable releases? We are trying to produce
> a runtime PM system of product quality based on 3.0.y and we've
> already had to backport this patch ourselves to get things stable.
>
> We have also backported:
> PM: Introduce generic "noirq" callback routines for subsystems (v2)
> PM / Runtime: Update documentation of interactions with system sleep
> PM / Runtime: Add new helper function: pm_runtime_status_suspended()
>
> And now it seems to be sufficient to get this thing going.
So, what specific git commits do you want to see in the 3.0-stable
tree, and in what order should they be applied in?
thanks,
greg k-h
^ permalink raw reply
* Re: [REGRESSION]: hibernate/sleep regression w/ bisection
From: Dmitry Torokhov @ 2011-11-02 17:08 UTC (permalink / raw)
To: Andrew Watts; +Cc: tj, linux-pm, linux-kernel
In-Reply-To: <20111102170058.GA2388@zeus>
On Wednesday, November 02, 2011 10:01:00 AM Andrew Watts wrote:
> On Wed, Nov 02, 2011 at 09:31:09AM -0700, Dmitry Torokhov wrote:
> > OK, it looks like you do have input, what you do not have is X
> > functioning.
>
> The extent of my tests on input were to notice that the LED for caps
> lock does not toggle when pressing "caps lock" after a bad resume. I
> have also tried to blindly type "reboot" and such in what should be an
> active xterm.
When in X it is X that is responsible for LED switching. Also input
processing beyond the kernel won't be happening if X is hosed.
--
Dmitry
^ permalink raw reply
* Re: [REGRESSION]: hibernate/sleep regression w/ bisection
From: Andrew Watts @ 2011-11-02 17:01 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: tj, linux-pm, linux-kernel
In-Reply-To: <20111102163109.GA29430@core.coreip.homeip.net>
On Wed, Nov 02, 2011 at 09:31:09AM -0700, Dmitry Torokhov wrote:
>
> OK, it looks like you do have input, what you do not have is X
> functioning.
The extent of my tests on input were to notice that the LED for caps
lock does not toggle when pressing "caps lock" after a bad resume. I
have also tried to blindly type "reboot" and such in what should be an
active xterm.
~ Andy
^ permalink raw reply
* PM / OPP: Fix build when CONFIG_PM_OPP is not set
From: Tony Lindgren @ 2011-11-02 17:00 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Kevin Hilman, linux-kernel, Kyungmin Park, MyungJoo Ham, linux-pm,
linux-omap
Commit 03ca370fbf7b76d6d002380dbdc2cdc2319f9c80 (PM / OPP: Add
OPP availability change notifier) does not compile if CONFIG_PM_OPP
is not set:
arch/arm/plat-omap/omap-pm-noop.o: In function `opp_get_notifier':
include/linux/opp.h:103: multiple definition of `opp_get_notifier'
include/linux/opp.h:103: first defined here
Also fix incorrect comment.
Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mike Turquette <mturquette@ti.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
I'm seeing this with omap1_defconfig at least.
--- a/include/linux/opp.h
+++ b/include/linux/opp.h
@@ -97,11 +97,11 @@ static inline int opp_disable(struct device *dev, unsigned long freq)
return 0;
}
-struct srcu_notifier_head *opp_get_notifier(struct device *dev)
+static inline struct srcu_notifier_head *opp_get_notifier(struct device *dev)
{
return ERR_PTR(-EINVAL);
}
-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_OPP */
#if defined(CONFIG_CPU_FREQ) && defined(CONFIG_PM_OPP)
int opp_init_cpufreq_table(struct device *dev,
^ permalink raw reply
* Re: [REGRESSION]: hibernate/sleep regression w/ bisection
From: Dmitry Torokhov @ 2011-11-02 16:31 UTC (permalink / raw)
To: Andrew Watts; +Cc: tj, linux-pm, linux-kernel
In-Reply-To: <20111102160208.GA6657@zeus>
On Wed, Nov 02, 2011 at 11:02:09AM -0500, Andrew Watts wrote:
> On Tue, Nov 01, 2011 at 10:46:58PM -0700, Dmitry Torokhov wrote:
> >
> > Are you hibernating from text console or X? Can you try plugging in USB
> > keyboard and see if you are getting input? What about ssh into the box?
>
> Forgot to answer a few of your questions. Plugging in a USB keyboard does
> not help.
OK, it looks like you do have input, what you do not have is X
functioning.
I am not sure why radeon reports lockup with that patch; serio uses
system_long_wq to schedule its long-playing works and the only other
user of system_long_wq seems to be sata, but then concurrency management
should take care of them sharing the workqueue.
Tejun, do you have any ideas here?
Thanks.
--
Dmitry
^ permalink raw reply
* Re: [REGRESSION]: hibernate/sleep regression w/ bisection
From: Andrew Watts @ 2011-11-02 16:02 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: tj, linux-pm, linux-kernel
In-Reply-To: <20111102054658.GA29035@core.coreip.homeip.net>
On Tue, Nov 01, 2011 at 10:46:58PM -0700, Dmitry Torokhov wrote:
>
> Are you hibernating from text console or X? Can you try plugging in USB
> keyboard and see if you are getting input? What about ssh into the box?
Forgot to answer a few of your questions. Plugging in a USB keyboard does
not help. And for now I cannot test whether the sshd daemon is reponsive
though it would seem that it would indeed be OK.
~ Andy
^ permalink raw reply
* Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking
From: Tejun Heo @ 2011-11-02 15:10 UTC (permalink / raw)
To: MyungJoo Ham; +Cc: netdev, linux-pm, David Fries, linux-kernel
In-Reply-To: <CAJ0PZbSRX6Rqn5NnnMkVGrPovtSuc2NWasysOYnjr3j9rON1VQ@mail.gmail.com>
Hello,
On Wed, Nov 02, 2011 at 06:44:31PM +0900, MyungJoo Ham wrote:
> > Hmmm... sounds like taking down network interfaces before starting
> > hibernation sequence should be enough, which shouldn't be too
> > difficult to implement from userland. Rafael, what do you think?
> >
> > Thanks.
>
> Um... it seems that the "thaw" callbacks of network interfaces or TCP
> should do something on this.
>
> Probably, the "thaw" callbacks should make sure that the TCP
> connections are closed?
I don't think it's a good idea to diddle with TCP connections from
that layer. From what I understand, it seem all we need is plugging
tx/rx while preparing for hibernation. That shouldn't be too
difficult.
Thanks.
--
tejun
^ permalink raw reply
* Re: [REGRESSION]: hibernate/sleep regression w/ bisection
From: Andrew Watts @ 2011-11-02 15:04 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: tj, linux-pm, airlied, linux-kernel
In-Reply-To: <20111102054658.GA29035@core.coreip.homeip.net>
On Tue, Nov 01, 2011 at 10:46:58PM -0700, Dmitry Torokhov wrote:
>
> I am not sure how this commit could cause your "no video" issue...
> Are you hibernating from text console or X? Can you try plugging in USB
> keyboard and see if you are getting input? What about ssh into the box?
>
> Thanks.
>
> --
> Dmitry
Hi Dmitry et al.
Yes, I was hibernating from X. I do not have problems when I hibernate from
console.
I conducted the following exercise: a script runs a 20-hibernate loop that
saves dmesg output to individual files and pings my httpd server (on lo)
after every resume.
>From X, with the vanilla 2.6.39.4 kernel, I had 5 bad resumes out of 20
(the script continues though). My httpd server is pinged after all 20
resumes and my dmesg is saved to a file. So, even after the bad resumes,
loopback networking and HD IO are functional.
The five problem resumes all have a similar warning in dmesg (see below).
I am copying Rafael Wysocki (since I can't seem to post to linux-pm@)
and David Airlie (given the 'GPU lockup CP stall' message). [Please see
thread https://lkml.org/lkml/2011/11/1/126 for complete history]
The same loop test conducted either a) from console or b) from X after
reversing commits 8ee294cd9def000 & 1d64b655dc083df encounters no problems
(all 20 resumes succeed) and there are no GPU lockup messages in my saved
dmesg output.
System specs:
P4 + ATI 9100IGP
~ Andy
PS. My messages do not arrive to linux-pm@lists.linux-foundation.org. Does
posting require subscription?
-----------------------------
[ 243.626231] radeon 0000:01:05.0: GPU lockup CP stall for more than 82318msec
[ 243.626236] ------------[ cut here ]------------
[ 243.626252] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:246 radeon_fence_wait+0x2c3/0x311()
[ 243.626256] Hardware name: Pavilion zv5000
[ 243.626259] GPU lockup (waiting for 0x00000F8C last fence id 0x00000F8A)
[ 243.626262] Modules linked in: fan snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss fuse pl2303 snd_atiixp snd_ac97_codec ac97_bus snd_pcm pcmcia snd_timer snd ohci_hcd shpchp processor yenta_socket pcmcia_rsrc ssb video thermal wmi battery button ac ehci_hcd thermal_sys pcmcia_core firewire_ohci snd_page_alloc
[ 243.626311] Pid: 1646, comm: X Tainted: G M 2.6.39.4 #19
[ 243.626314] Call Trace:
[ 243.626325] [<c1033164>] warn_slowpath_common+0x67/0x8e
[ 243.626331] [<c12c082a>] ? radeon_fence_wait+0x2c3/0x311
[ 243.626335] [<c12c082a>] ? radeon_fence_wait+0x2c3/0x311
[ 243.626339] [<c1033207>] warn_slowpath_fmt+0x2e/0x30
[ 243.626344] [<c12c082a>] radeon_fence_wait+0x2c3/0x311
[ 243.626352] [<c104b12a>] ? wake_up_bit+0x62/0x62
[ 243.626358] [<c12c0e37>] radeon_sync_obj_wait+0xc/0xe
[ 243.626363] [<c12908be>] ttm_bo_wait+0xa1/0x108
[ 243.626371] [<c12d6e7b>] radeon_gem_wait_idle_ioctl+0x76/0xc4
[ 243.626378] [<c127e62e>] drm_ioctl+0x1c2/0x42c
[ 243.626383] [<c12d6e05>] ? radeon_gem_set_tiling_ioctl+0x8e/0x8e
[ 243.626389] [<c109ad9a>] ? perf_pmu_enable+0x1a/0x21
[ 243.626395] [<c1026eed>] ? update_curr+0x164/0x24d
[ 243.626403] [<c11f5171>] ? rb_erase+0x16e/0x27a
[ 243.626411] [<c1007ba8>] ? __switch_to_xtra+0xf7/0x11d
[ 243.626417] [<c103015e>] ? set_next_entity+0xad/0xc1
[ 243.626422] [<c127e46c>] ? drm_version+0x8a/0x8a
[ 243.626428] [<c10e288e>] do_vfs_ioctl+0x79/0x54b
[ 243.626436] [<c158c95a>] ? schedule+0x29a/0x614
[ 243.626441] [<c10e2dcb>] sys_ioctl+0x6b/0x70
[ 243.626446] [<c1593813>] sysenter_do_call+0x12/0x22
[ 243.626449] ---[ end trace 17a119d9361222c9 ]---
[ 243.633615] radeon 0000:01:05.0: GPU reset succeed
[ 243.671128] radeon 0000:01:05.0: WB disabled
[ 243.671170] [drm] radeon: ring at 0x00000000D2001000
[ 243.671189] [drm] ring test succeeded in 0 usecs
[ 243.671200] [drm] ib test succeeded in 1 usecs
[ 243.704033] ohci_hcd 0000:00:13.1: auto-stop root hub
[ 243.704048] ohci_hcd 0000:00:13.0: auto-stop root hub
-----------------------------
^ permalink raw reply
* Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking
From: MyungJoo Ham @ 2011-11-02 9:44 UTC (permalink / raw)
To: Tejun Heo; +Cc: netdev, linux-pm, David Fries, linux-kernel
In-Reply-To: <20111030201618.GA7696@google.com>
On Mon, Oct 31, 2011 at 5:16 AM, Tejun Heo <tj@kernel.org> wrote:
> (cc'ing Rafael and linux-pm)
>
> On Sat, Oct 29, 2011 at 11:48:21PM -0500, David Fries wrote:
>> I saw the write up on this on lwn.net, pretty creative by the way, and
>> it got me thinking about a different checkpoint/restart problem I've
>> been running into. Specifically in hibernating to disk. In the
>> hibernate case active TCP connections hang after resuming, while an
>> idle TCP connection will continue after the system is back up. My
>> observation is the kernel checkpoints itself to memory, enables
>> devices, writes out that checkpoint image to storage, then powers off.
>> The problem is if TCP packets are received while writing to storage,
>> the kernel will continue to queue and ack those TCP packets, but the
>> running kernel and it's network state is shortly lost. When the
>> computer resumes, those TCP byte sequences hang the TCP connection for
>> an extended period of time while the resumed computer refuses to
>> acknowledge the data that was received after checkpointing and the now
>> running kernel knew nothing about, and the other computer tries in
>> vain to resend any data that hadn't yet been acknowledged, which is
>> always after the data that was lost, until one of them eventually
>> gives up.
>>
>> I've been wondering if it was safe or possible to leave any network
>> interfaces down after the checkpoint, or what the right solution would
>> be. I didn't think marking every TCP connection with a ZOMBIE_KERNEL
>> bit just after the kernel checkpoint (for the kernel is walking dead
>> and won't remember anything that happens), and then prevent any TCP
>> acks from being sent for those connections would be the right
>> solution. I've taken to unplugging the physical lan cable,
>> hibernating to disk, and plugging it back in after the system is down,
>> to avoid the problem. Any ideas?
>
> Hmmm... sounds like taking down network interfaces before starting
> hibernation sequence should be enough, which shouldn't be too
> difficult to implement from userland. Rafael, what do you think?
>
> Thanks.
Um... it seems that the "thaw" callbacks of network interfaces or TCP
should do something on this.
Probably, the "thaw" callbacks should make sure that the TCP
connections are closed?
Cheers,
MyungJoo
>
> --
> tejun
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/linux-pm
>
--
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics
^ permalink raw reply
* Re: Resume problem, reboot instead of resume after suspend on Zepto Nox A14
From: MyungJoo Ham @ 2011-11-02 9:39 UTC (permalink / raw)
To: Damien Thébault; +Cc: linux-pm
In-Reply-To: <CAEcrOvACd=zZ000t2niWM6DFgSuJOHyS1MEibwcA-=ouGfzU6w@mail.gmail.com>
On Tue, Nov 1, 2011 at 9:27 PM, Damien Thébault
<damien.thebault@gmail.com> wrote:
> Hello,
>
> I have a problem with suspend and resume on my laptop, I tried to
> solve the problem myself but nothing works.
>
> I was using kernel 2.6.36 in the past and suspend was working just out
> of the box (however, I had problem with sound and needed a more recent
> kernel for it).
> With the next kernel, suspend was not working anymore, but I thought
> it was just a temporary bug.
> Some month later, I retried with a more recent kernel, and it still
> failed. So I bisected the problem and found that commit
> bd25f4dd6972755579d0ea50d1a5ace2e9b00d1a was the first "bad" commit
> ("HID: hiddev: use usb_find_interface, get rid of BKL"). This commit
> made other laptops to fail suspend/resume too, and it was reverted
> later (commit 9c9e54a8df0be48aa359744f412377cc55c3b7d2 "HID: hiddev:
> fix memory corruption due to invalid intfdata").
9c9e54 does not revert bd25f4d. It provides "bugfix" for bd25f4d.
Thus, if this is due to a bug not addressed by 9c9e54 and there is a
bug that prevents S2RAM in bd25f4d, the problem will continue to
happen.
Have you tried to explicitly revert bd25f4d (and its successors of
hiddev.c) at 3.0.6?
> However, other changes must have been made elsewhere, and even after
> the revert, suspend didn't work anymore, and it's still not working
> now (3.0.6 currently).
>
> A lot of things changed in the kernel, and I don't expect to be able
> to solve my problem by looking at the git history, so I'm trying to
> solve the problem by using standard power-management debugging.
>
> First I'll describe the problem:
> - I launch suspend to ram with any tool (I used to run acpitool -s,
> but echo mem > /sys/power/state or s2ram works too)
> - The computer goes to sleep, the power led blinks slowly
> - I push a keyboard key (usually Ctrl), but any key or the power key
> are working too
> - Instead of resuming successfully, the computer reboots and the BIOS
> runs, like a cold power-up would do.
>
> I looked at Documentation/power/basic-pm-debugging.txt and tested
> modes of hibernations:
> - echo freezer > /sys/power/pm_test
> - echo mem > /sys/power/state
> This worked just fine, I then tested with "devices", "platform",
> "processors" and "core", and all worked.
> If I use "none" to use the normal suspend, it just reboots to the BIOS.
When core is used for pm_test, the platform_ops's enter() is not
called. Thus, you are not actually testing the real suspend-to-RAM,
you are testing everything suspend-related materials except for the
real suspend-resume mechanism for the CPU.
I'm not familiar with x86's ACPI; however, this looks like a
resume-related problem.
Probably, at the BIOS (or any other pre-BIOS stage?), the CPU cannot
read the "the system has been suspended-to-RAM" message?
or the resume routine of kernel is failing and the system resets or
jumps to BIOS?
, which are some issues we've experienced with ARM devices and its bootloaders.
You'll probably need to examine the ACPI's suspend-to-RAM related bits
or data structure at the platform_ops's enter() callback. And see if
anything is corrupted or mis-configured.
Anyway, having this hid device driver affecting suspend-to-RAM in such
a way seems very very odd...
Have you tried it with different compilers or different Kconfig options?
>
> I then followed the instructions at Documentation/power/s2ram.txt and
> used PM_TRACE:
> - echo 1 > /sys/power/pm_trace
> - echo mem > /sys/power/state
> On reboot, I can see the "Magic number" line, but no hash is matching.
>
> I then tried suspend to disk, everything works just fine there, it
> works perfectly.
>
> I searched on forums and mailing lists to see which quirks I could try:
> - "i8042.reset=1": same behaviour
> - "acpi_sleep=s3_beep": same behaviour, no beep at all
> - "acpi_sleep=nonvs": same behaviour
> - "no_console_suspend": same behaviour
>
> The DSDT does not contain any suspect windows-specific switches,
> there's some "windows 2006" checks, but nothing for other versions,
> and as I wrote previously linux was suspending well with the 2.6.36
> kernel.
> I'm not even sure that pressing a key does resume the kernel at all
> (since s3_beep doesn't do any sound and PM_TRACE doesn't show
> anything).
>
> What could I do to debug this further? Is there a list of parameters to try?
> What is the extra step done between "core" and "none" in /sys/power/pm_test?
>
> This is a Zepto Nox A14 laptop, Core 2 Duo, ICH9 chipset, iwlagn wifi,
> ... I can provide dmidecode output, ACPI dumps and so on.
>
> Regards,
> --
> Damien Thebault
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/linux-pm
>
--
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics
^ permalink raw reply
* Re: [REGRESSION]: hibernate/sleep regression w/ bisection
From: Dmitry Torokhov @ 2011-11-02 5:46 UTC (permalink / raw)
To: Andrew Watts; +Cc: tj, linux-pm, linux-kernel
In-Reply-To: <20111101124759.GA1326@zeus>
Hi Andrew,
On Tue, Nov 01, 2011 at 07:48:19AM -0500, Andrew Watts wrote:
> Hi.
>
> Hibernate/sleep (echo disk/mem > /sys/power/state) has presented problems
> for me starting with 2.6.39. Kernel 2.6.37.6 was the last completely bug-free
> version I used (I skipped the 2.6.38 branch entirely).
>
> The symptoms are that upon resume (from sleep/hibernate) there is no video
> nor any keyboard input with the exception of sysrq.
If SysRq is working that means that i8042, atkbd and input core are
working properly.
>
> It has been a frustrating bug to hunt down because it is not easily
> reproduced; sometimes the bug doesn't pop up until after a long sequence of
> hibernate/sleep cycles.
>
> I successfully bisected the problem to: 8ee294cd9def000.
>
> =======
> Commit: 8ee294cd9def0004887da7f44b80563493b0a097
> Author: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> Date: Mon Nov 15 01:39:57 2010 -0800
> Input: serio - convert to common workqueue instead of a thread
> =======
>
> Backing out 8ee294cd9def000 (which requires reversing part of
> 1d64b655dc083df also) fixes this particular problem on 2.6.39.4,
> 3.0.8, and 3.1.
I am not sure how this commit could cause your "no video" issue...
Are you hibernating from text console or X? Can you try plugging in USB
keyboard and see if you are getting input? What about ssh into the box?
Thanks.
--
Dmitry
^ permalink raw reply
* Re: [REGRESSION]: hibernate/sleep regression w/ bisection
From: Andrew Watts @ 2011-11-02 0:24 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-pm, dmitry.torokhov, linux-kernel
In-Reply-To: <20111101201902.GA1763@google.com>
On Tue, Nov 01, 2011 at 01:19:02PM -0700, Tejun Heo wrote:
>
> Odd... how about the following one? Thanks.
>
No, the 2nd patch doesn't work either. However, as I mentioned, reversing
8ee294cd9def000 (and part of 1d64b655dc083df) does fix it.
~ Andy
PS I noticed your posts make it to linux-pm@lists.linux-foundation.org.
Mine never arrive there. Is that list closed?
^ permalink raw reply
* [PATCH 10/10] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
These three methods are no longer used. Kill them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
Documentation/cgroups/cgroups.txt | 20 --------------
include/linux/cgroup.h | 3 --
kernel/cgroup.c | 53 +++---------------------------------
3 files changed, 5 insertions(+), 71 deletions(-)
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index bf5d6c9..eb1b609 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -615,13 +615,6 @@ fork. If this method returns 0 (success) then this should remain valid
while the caller holds cgroup_mutex and it is ensured that either
attach() or cancel_attach() will be called in future.
-int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk);
-(cgroup_mutex held by caller)
-
-As can_attach, but for operations that must be run once per task to be
-attached (possibly many when using cgroup_attach_proc). Called after
-can_attach.
-
void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset)
(cgroup_mutex held by caller)
@@ -632,12 +625,6 @@ function, so that the subsystem can implement a rollback. If not, not necessary.
This will be called only about subsystems whose can_attach() operation have
succeeded. The parameters are identical to can_attach().
-void pre_attach(struct cgroup *cgrp);
-(cgroup_mutex held by caller)
-
-For any non-per-thread attachment work that needs to happen before
-attach_task. Needed by cpuset.
-
void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset)
(cgroup_mutex held by caller)
@@ -646,13 +633,6 @@ Called after the task has been attached to the cgroup, to allow any
post-attachment activity that requires memory allocations or blocking.
The parameters are identical to can_attach().
-void attach_task(struct cgroup *cgrp, struct task_struct *tsk);
-(cgroup_mutex held by caller)
-
-As attach, but for operations that must be run once per task to be attached,
-like can_attach_task. Called before attach. Currently does not support any
-subsystem that might need the old_cgrp for every thread in the group.
-
void fork(struct cgroup_subsy *ss, struct task_struct *task)
Called when a task is forked into a cgroup.
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 2470c8e..5659d37 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -490,11 +490,8 @@ struct cgroup_subsys {
void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset);
- int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset);
- void (*pre_attach)(struct cgroup *cgrp);
- void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset);
void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0bb7927..d2b3697 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1944,13 +1944,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
goto out;
}
}
- if (ss->can_attach_task) {
- retval = ss->can_attach_task(cgrp, tsk);
- if (retval) {
- failed_ss = ss;
- goto out;
- }
- }
}
retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, false);
@@ -1958,10 +1951,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
goto out;
for_each_subsys(root, ss) {
- if (ss->pre_attach)
- ss->pre_attach(cgrp);
- if (ss->attach_task)
- ss->attach_task(cgrp, tsk);
if (ss->attach)
ss->attach(ss, cgrp, &tset);
}
@@ -2093,7 +2082,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
{
int retval, i, group_size, nr_migrating_tasks;
struct cgroup_subsys *ss, *failed_ss = NULL;
- bool cancel_failed_ss = false;
/* guaranteed to be initialized later, but the compiler needs this */
struct css_set *oldcg;
struct cgroupfs_root *root = cgrp->root;
@@ -2188,21 +2176,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
goto out_cancel_attach;
}
}
- /* a callback to be run on every thread in the threadgroup. */
- if (ss->can_attach_task) {
- /* run on each task in the threadgroup. */
- for (i = 0; i < group_size; i++) {
- tc = flex_array_get(group, i);
- if (tc->cgrp == cgrp)
- continue;
- retval = ss->can_attach_task(cgrp, tc->task);
- if (retval) {
- failed_ss = ss;
- cancel_failed_ss = true;
- goto out_cancel_attach;
- }
- }
- }
}
/*
@@ -2234,15 +2207,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
}
/*
- * step 3: now that we're guaranteed success wrt the css_sets, proceed
- * to move all tasks to the new cgroup, calling ss->attach_task for each
- * one along the way. there are no failure cases after here, so this is
- * the commit point.
+ * step 3: now that we're guaranteed success wrt the css_sets,
+ * proceed to move all tasks to the new cgroup. There are no
+ * failure cases after here, so this is the commit point.
*/
- for_each_subsys(root, ss) {
- if (ss->pre_attach)
- ss->pre_attach(cgrp);
- }
for (i = 0; i < group_size; i++) {
tc = flex_array_get(group, i);
/* leave current thread as it is if it's already there */
@@ -2251,19 +2219,11 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
retval = cgroup_task_migrate(cgrp, tc->cgrp, tc->task, true);
BUG_ON(retval != 0);
-
- /* attach each task to each subsystem */
- for_each_subsys(root, ss) {
- if (ss->attach_task)
- ss->attach_task(cgrp, tc->task);
- }
}
/* nothing is sensitive to fork() after this point. */
/*
- * step 4: do expensive, non-thread-specific subsystem callbacks.
- * TODO: if ever a subsystem needs to know the oldcgrp for each task
- * being moved, this call will need to be reworked to communicate that.
+ * step 4: do subsystem attach callbacks.
*/
for_each_subsys(root, ss) {
if (ss->attach)
@@ -2287,11 +2247,8 @@ out_cancel_attach:
/* same deal as in cgroup_attach_task */
if (retval) {
for_each_subsys(root, ss) {
- if (ss == failed_ss) {
- if (cancel_failed_ss && ss->cancel_attach)
- ss->cancel_attach(ss, cgrp, &tset);
+ if (ss == failed_ss)
break;
- }
if (ss->cancel_attach)
ss->cancel_attach(ss, cgrp, &tset);
}
--
1.7.3.1
^ permalink raw reply related
* [PATCH 09/10] cgroup, cpuset: don't use ss->pre_attach()
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
->pre_attach() is supposed to be called before migration, which is
observed during process migration but task migration does it the other
way around. The only ->pre_attach() user is cpuset which can do the
same operaitons in ->can_attach(). Collapse cpuset_pre_attach() into
cpuset_can_attach().
-v2: Patch contamination from later patch removed. Spotted by Paul
Menage.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/cpuset.c | 29 ++++++++++++-----------------
1 files changed, 12 insertions(+), 17 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 472ddd6..f0b8df3 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1367,6 +1367,15 @@ static int fmeter_getrate(struct fmeter *fmp)
return val;
}
+/*
+ * Protected by cgroup_lock. The nodemasks must be stored globally because
+ * dynamically allocating them is not allowed in can_attach, and they must
+ * persist until attach.
+ */
+static cpumask_var_t cpus_attach;
+static nodemask_t cpuset_attach_nodemask_from;
+static nodemask_t cpuset_attach_nodemask_to;
+
/* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset)
@@ -1393,29 +1402,16 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
if ((ret = security_task_setscheduler(task)))
return ret;
}
- return 0;
-}
-
-/*
- * Protected by cgroup_lock. The nodemasks must be stored globally because
- * dynamically allocating them is not allowed in pre_attach, and they must
- * persist among pre_attach, and attach.
- */
-static cpumask_var_t cpus_attach;
-static nodemask_t cpuset_attach_nodemask_from;
-static nodemask_t cpuset_attach_nodemask_to;
-
-/* Set-up work for before attaching each task. */
-static void cpuset_pre_attach(struct cgroup *cont)
-{
- struct cpuset *cs = cgroup_cs(cont);
+ /* prepare for attach */
if (cs == &top_cpuset)
cpumask_copy(cpus_attach, cpu_possible_mask);
else
guarantee_online_cpus(cs, cpus_attach);
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
+
+ return 0;
}
static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
@@ -1901,7 +1897,6 @@ struct cgroup_subsys cpuset_subsys = {
.create = cpuset_create,
.destroy = cpuset_destroy,
.can_attach = cpuset_can_attach,
- .pre_attach = cpuset_pre_attach,
.attach = cpuset_attach,
.populate = cpuset_populate,
.post_clone = cpuset_post_clone,
--
1.7.3.1
^ permalink raw reply related
* [PATCH 08/10] cgroup: don't use subsys->can_attach_task() or ->attach_task()
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: Peter Zijlstra, fweisbec, containers, Ingo Molnar,
Daisuke Nishimura, linux-kernel, oleg, Tejun Heo, linux-pm,
James Morris, akpm, KAMEZAWA Hiroyuki
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
Now that subsys->can_attach() and attach() take @tset instead of
@task, they can handle per-task operations. Convert
->can_attach_task() and ->attach_task() users to use ->can_attach()
and attach() instead. Most converions are straight-forward.
Noteworthy changes are,
* In cgroup_freezer, remove unnecessary NULL assignments to unused
methods. It's useless and very prone to get out of sync, which
already happened.
* In cpuset, PF_THREAD_BOUND test is checked for each task. This
doesn't make any practical difference but is conceptually cleaner.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
---
block/blk-cgroup.c | 45 +++++++++++++++++++-----------
kernel/cgroup_freezer.c | 25 ++++++-----------
kernel/cpuset.c | 70 +++++++++++++++++++++-------------------------
kernel/events/core.c | 13 +++++---
kernel/sched.c | 31 +++++++++++++--------
5 files changed, 96 insertions(+), 88 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b596e54..b5e3fef 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -30,8 +30,10 @@ EXPORT_SYMBOL_GPL(blkio_root_cgroup);
static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
struct cgroup *);
-static int blkiocg_can_attach_task(struct cgroup *, struct task_struct *);
-static void blkiocg_attach_task(struct cgroup *, struct task_struct *);
+static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
+ struct cgroup_taskset *);
+static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
+ struct cgroup_taskset *);
static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
@@ -44,8 +46,8 @@ static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
struct cgroup_subsys blkio_subsys = {
.name = "blkio",
.create = blkiocg_create,
- .can_attach_task = blkiocg_can_attach_task,
- .attach_task = blkiocg_attach_task,
+ .can_attach = blkiocg_can_attach,
+ .attach = blkiocg_attach,
.destroy = blkiocg_destroy,
.populate = blkiocg_populate,
#ifdef CONFIG_BLK_CGROUP
@@ -1609,30 +1611,39 @@ done:
* of the main cic data structures. For now we allow a task to change
* its cgroup only if it's the only owner of its ioc.
*/
-static int blkiocg_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static int blkiocg_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_taskset *tset)
{
+ struct task_struct *task;
struct io_context *ioc;
int ret = 0;
/* task_lock() is needed to avoid races with exit_io_context() */
- task_lock(tsk);
- ioc = tsk->io_context;
- if (ioc && atomic_read(&ioc->nr_tasks) > 1)
- ret = -EINVAL;
- task_unlock(tsk);
-
+ cgroup_taskset_for_each(task, cgrp, tset) {
+ task_lock(task);
+ ioc = task->io_context;
+ if (ioc && atomic_read(&ioc->nr_tasks) > 1)
+ ret = -EINVAL;
+ task_unlock(task);
+ if (ret)
+ break;
+ }
return ret;
}
-static void blkiocg_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static void blkiocg_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_taskset *tset)
{
+ struct task_struct *task;
struct io_context *ioc;
- task_lock(tsk);
- ioc = tsk->io_context;
- if (ioc)
- ioc->cgroup_changed = 1;
- task_unlock(tsk);
+ cgroup_taskset_for_each(task, cgrp, tset) {
+ task_lock(task);
+ ioc = task->io_context;
+ if (ioc)
+ ioc->cgroup_changed = 1;
+ task_unlock(task);
+ }
}
void blkio_policy_register(struct blkio_policy_type *blkiop)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index a72d5fa..68ef861 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -163,10 +163,19 @@ static int freezer_can_attach(struct cgroup_subsys *ss,
struct cgroup_taskset *tset)
{
struct freezer *freezer;
+ struct task_struct *task;
/*
* Anything frozen can't move or be moved to/from.
*/
+ rcu_read_lock();
+ cgroup_taskset_for_each(task, new_cgroup, tset) {
+ if (__cgroup_freezing_or_frozen(task)) {
+ rcu_read_unlock();
+ return -EBUSY;
+ }
+ }
+ rcu_read_unlock();
freezer = cgroup_freezer(new_cgroup);
if (freezer->state != CGROUP_THAWED)
@@ -175,17 +184,6 @@ static int freezer_can_attach(struct cgroup_subsys *ss,
return 0;
}
-static int freezer_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
-{
- rcu_read_lock();
- if (__cgroup_freezing_or_frozen(tsk)) {
- rcu_read_unlock();
- return -EBUSY;
- }
- rcu_read_unlock();
- return 0;
-}
-
static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
{
struct freezer *freezer;
@@ -381,10 +379,5 @@ struct cgroup_subsys freezer_subsys = {
.populate = freezer_populate,
.subsys_id = freezer_subsys_id,
.can_attach = freezer_can_attach,
- .can_attach_task = freezer_can_attach_task,
- .pre_attach = NULL,
- .attach_task = NULL,
- .attach = NULL,
.fork = freezer_fork,
- .exit = NULL,
};
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 2e5825b..472ddd6 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1372,33 +1372,34 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset)
{
struct cpuset *cs = cgroup_cs(cgrp);
+ struct task_struct *task;
+ int ret;
if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
return -ENOSPC;
- /*
- * Kthreads bound to specific cpus cannot be moved to a new cpuset; we
- * cannot change their cpu affinity and isolating such threads by their
- * set of allowed nodes is unnecessary. Thus, cpusets are not
- * applicable for such threads. This prevents checking for success of
- * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may
- * be changed.
- */
- if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND)
- return -EINVAL;
-
+ cgroup_taskset_for_each(task, cgrp, tset) {
+ /*
+ * Kthreads bound to specific cpus cannot be moved to a new
+ * cpuset; we cannot change their cpu affinity and
+ * isolating such threads by their set of allowed nodes is
+ * unnecessary. Thus, cpusets are not applicable for such
+ * threads. This prevents checking for success of
+ * set_cpus_allowed_ptr() on all attached tasks before
+ * cpus_allowed may be changed.
+ */
+ if (task->flags & PF_THREAD_BOUND)
+ return -EINVAL;
+ if ((ret = security_task_setscheduler(task)))
+ return ret;
+ }
return 0;
}
-static int cpuset_can_attach_task(struct cgroup *cgrp, struct task_struct *task)
-{
- return security_task_setscheduler(task);
-}
-
/*
* Protected by cgroup_lock. The nodemasks must be stored globally because
* dynamically allocating them is not allowed in pre_attach, and they must
- * persist among pre_attach, attach_task, and attach.
+ * persist among pre_attach, and attach.
*/
static cpumask_var_t cpus_attach;
static nodemask_t cpuset_attach_nodemask_from;
@@ -1417,39 +1418,34 @@ static void cpuset_pre_attach(struct cgroup *cont)
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
}
-/* Per-thread attachment work. */
-static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk)
-{
- int err;
- struct cpuset *cs = cgroup_cs(cont);
-
- /*
- * can_attach beforehand should guarantee that this doesn't fail.
- * TODO: have a better way to handle failure here
- */
- err = set_cpus_allowed_ptr(tsk, cpus_attach);
- WARN_ON_ONCE(err);
-
- cpuset_change_task_nodemask(tsk, &cpuset_attach_nodemask_to);
- cpuset_update_task_spread_flag(cs, tsk);
-}
-
static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup_taskset *tset)
{
struct mm_struct *mm;
- struct task_struct *tsk = cgroup_taskset_first(tset);
+ struct task_struct *task;
+ struct task_struct *leader = cgroup_taskset_first(tset);
struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset);
struct cpuset *cs = cgroup_cs(cgrp);
struct cpuset *oldcs = cgroup_cs(oldcgrp);
+ cgroup_taskset_for_each(task, cgrp, tset) {
+ /*
+ * can_attach beforehand should guarantee that this doesn't
+ * fail. TODO: have a better way to handle failure here
+ */
+ WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
+
+ cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
+ cpuset_update_task_spread_flag(cs, task);
+ }
+
/*
* Change mm, possibly for multiple threads in a threadgroup. This is
* expensive and may sleep.
*/
cpuset_attach_nodemask_from = oldcs->mems_allowed;
cpuset_attach_nodemask_to = cs->mems_allowed;
- mm = get_task_mm(tsk);
+ mm = get_task_mm(leader);
if (mm) {
mpol_rebind_mm(mm, &cpuset_attach_nodemask_to);
if (is_memory_migrate(cs))
@@ -1905,9 +1901,7 @@ struct cgroup_subsys cpuset_subsys = {
.create = cpuset_create,
.destroy = cpuset_destroy,
.can_attach = cpuset_can_attach,
- .can_attach_task = cpuset_can_attach_task,
.pre_attach = cpuset_pre_attach,
- .attach_task = cpuset_attach_task,
.attach = cpuset_attach,
.populate = cpuset_populate,
.post_clone = cpuset_post_clone,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d1a1bee..ed35211 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7136,10 +7136,13 @@ static int __perf_cgroup_move(void *info)
return 0;
}
-static void
-perf_cgroup_attach_task(struct cgroup *cgrp, struct task_struct *task)
+static void perf_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_taskset *tset)
{
- task_function_call(task, __perf_cgroup_move, task);
+ struct task_struct *task;
+
+ cgroup_taskset_for_each(task, cgrp, tset)
+ task_function_call(task, __perf_cgroup_move, task);
}
static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
@@ -7153,7 +7156,7 @@ static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
if (!(task->flags & PF_EXITING))
return;
- perf_cgroup_attach_task(cgrp, task);
+ task_function_call(task, __perf_cgroup_move, task);
}
struct cgroup_subsys perf_subsys = {
@@ -7162,6 +7165,6 @@ struct cgroup_subsys perf_subsys = {
.create = perf_cgroup_create,
.destroy = perf_cgroup_destroy,
.exit = perf_cgroup_exit,
- .attach_task = perf_cgroup_attach_task,
+ .attach = perf_cgroup_attach,
};
#endif /* CONFIG_CGROUP_PERF */
diff --git a/kernel/sched.c b/kernel/sched.c
index d87c6e5..49e3605 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9129,24 +9129,31 @@ cpu_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
sched_destroy_group(tg);
}
-static int
-cpu_cgroup_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static int cpu_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_taskset *tset)
{
+ struct task_struct *task;
+
+ cgroup_taskset_for_each(task, cgrp, tset) {
#ifdef CONFIG_RT_GROUP_SCHED
- if (!sched_rt_can_attach(cgroup_tg(cgrp), tsk))
- return -EINVAL;
+ if (!sched_rt_can_attach(cgroup_tg(cgrp), task))
+ return -EINVAL;
#else
- /* We don't support RT-tasks being in separate groups */
- if (tsk->sched_class != &fair_sched_class)
- return -EINVAL;
+ /* We don't support RT-tasks being in separate groups */
+ if (task->sched_class != &fair_sched_class)
+ return -EINVAL;
#endif
+ }
return 0;
}
-static void
-cpu_cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static void cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_taskset *tset)
{
- sched_move_task(tsk);
+ struct task_struct *task;
+
+ cgroup_taskset_for_each(task, cgrp, tset)
+ sched_move_task(task);
}
static void
@@ -9482,8 +9489,8 @@ struct cgroup_subsys cpu_cgroup_subsys = {
.name = "cpu",
.create = cpu_cgroup_create,
.destroy = cpu_cgroup_destroy,
- .can_attach_task = cpu_cgroup_can_attach_task,
- .attach_task = cpu_cgroup_attach_task,
+ .can_attach = cpu_cgroup_can_attach,
+ .attach = cpu_cgroup_attach,
.exit = cpu_cgroup_exit,
.populate = cpu_cgroup_populate,
.subsys_id = cpu_cgroup_subsys_id,
--
1.7.3.1
^ permalink raw reply related
* [PATCH 07/10] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, Daisuke Nishimura, linux-kernel, oleg,
Tejun Heo, linux-pm, James Morris, akpm, KAMEZAWA Hiroyuki
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
Currently, there's no way to pass multiple tasks to cgroup_subsys
methods necessitating the need for separate per-process and per-task
methods. This patch introduces cgroup_taskset which can be used to
pass multiple tasks and their associated cgroups to cgroup_subsys
methods.
Three methods - can_attach(), cancel_attach() and attach() - are
converted to use cgroup_taskset. This unifies passed parameters so
that all methods have access to all information. Conversions in this
patchset are identical and don't introduce any behavior change.
-v2: documentation updated as per Paul Menage's suggestion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>
---
Documentation/cgroups/cgroups.txt | 31 ++++++++----
include/linux/cgroup.h | 28 +++++++++-
kernel/cgroup.c | 99 +++++++++++++++++++++++++++++++++----
kernel/cgroup_freezer.c | 2 +-
kernel/cpuset.c | 18 ++++---
mm/memcontrol.c | 16 +++---
security/device_cgroup.c | 7 ++-
7 files changed, 158 insertions(+), 43 deletions(-)
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index cd67e90..bf5d6c9 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -594,15 +594,25 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be
called multiple times against a cgroup.
int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct task_struct *task)
+ struct cgroup_taskset *tset)
(cgroup_mutex held by caller)
-Called prior to moving a task into a cgroup; if the subsystem
-returns an error, this will abort the attach operation. If a NULL
-task is passed, then a successful result indicates that *any*
-unspecified task can be moved into the cgroup. Note that this isn't
-called on a fork. If this method returns 0 (success) then this should
-remain valid while the caller holds cgroup_mutex and it is ensured that either
+Called prior to moving one or more tasks into a cgroup; if the
+subsystem returns an error, this will abort the attach operation.
+@tset contains the tasks to be attached and is guaranteed to have at
+least one task in it.
+
+If there are multiple tasks in the taskset, then:
+ - it's guaranteed that all are from the same thread group
+ - @tset contains all tasks from the thread group whether or not
+ they're switching cgroups
+ - the first task is the leader
+
+Each @tset entry also contains the task's old cgroup and tasks which
+aren't switching cgroup can be skipped easily using the
+cgroup_taskset_for_each() iterator. Note that this isn't called on a
+fork. If this method returns 0 (success) then this should remain valid
+while the caller holds cgroup_mutex and it is ensured that either
attach() or cancel_attach() will be called in future.
int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk);
@@ -613,14 +623,14 @@ attached (possibly many when using cgroup_attach_proc). Called after
can_attach.
void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct task_struct *task, bool threadgroup)
+ struct cgroup_taskset *tset)
(cgroup_mutex held by caller)
Called when a task attach operation has failed after can_attach() has succeeded.
A subsystem whose can_attach() has some side-effects should provide this
function, so that the subsystem can implement a rollback. If not, not necessary.
This will be called only about subsystems whose can_attach() operation have
-succeeded.
+succeeded. The parameters are identical to can_attach().
void pre_attach(struct cgroup *cgrp);
(cgroup_mutex held by caller)
@@ -629,11 +639,12 @@ For any non-per-thread attachment work that needs to happen before
attach_task. Needed by cpuset.
void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct cgroup *old_cgrp, struct task_struct *task)
+ struct cgroup_taskset *tset)
(cgroup_mutex held by caller)
Called after the task has been attached to the cgroup, to allow any
post-attachment activity that requires memory allocations or blocking.
+The parameters are identical to can_attach().
void attach_task(struct cgroup *cgrp, struct task_struct *tsk);
(cgroup_mutex held by caller)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index da7e4bc..2470c8e 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -457,6 +457,28 @@ void cgroup_exclude_rmdir(struct cgroup_subsys_state *css);
void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css);
/*
+ * Control Group taskset, used to pass around set of tasks to cgroup_subsys
+ * methods.
+ */
+struct cgroup_taskset;
+struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset);
+struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset);
+struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset);
+int cgroup_taskset_size(struct cgroup_taskset *tset);
+
+/**
+ * cgroup_taskset_for_each - iterate cgroup_taskset
+ * @task: the loop cursor
+ * @skip_cgrp: skip if task's cgroup matches this, %NULL to iterate through all
+ * @tset: taskset to iterate
+ */
+#define cgroup_taskset_for_each(task, skip_cgrp, tset) \
+ for ((task) = cgroup_taskset_first((tset)); (task); \
+ (task) = cgroup_taskset_next((tset))) \
+ if (!(skip_cgrp) || \
+ cgroup_taskset_cur_cgroup((tset)) != (skip_cgrp))
+
+/*
* Control Group subsystem type.
* See Documentation/cgroups/cgroups.txt for details
*/
@@ -467,14 +489,14 @@ struct cgroup_subsys {
int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct task_struct *tsk);
+ struct cgroup_taskset *tset);
int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct task_struct *tsk);
+ struct cgroup_taskset *tset);
void (*pre_attach)(struct cgroup *cgrp);
void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct cgroup *old_cgrp, struct task_struct *tsk);
+ struct cgroup_taskset *tset);
void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup *old_cgrp, struct task_struct *task);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 19396d6..0bb7927 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1757,11 +1757,85 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
}
EXPORT_SYMBOL_GPL(cgroup_path);
+/*
+ * Control Group taskset
+ */
struct task_and_cgroup {
struct task_struct *task;
struct cgroup *cgrp;
};
+struct cgroup_taskset {
+ struct task_and_cgroup single;
+ struct flex_array *tc_array;
+ int tc_array_len;
+ int idx;
+ struct cgroup *cur_cgrp;
+};
+
+/**
+ * cgroup_taskset_first - reset taskset and return the first task
+ * @tset: taskset of interest
+ *
+ * @tset iteration is initialized and the first task is returned.
+ */
+struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset)
+{
+ if (tset->tc_array) {
+ tset->idx = 0;
+ return cgroup_taskset_next(tset);
+ } else {
+ tset->cur_cgrp = tset->single.cgrp;
+ return tset->single.task;
+ }
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_first);
+
+/**
+ * cgroup_taskset_next - iterate to the next task in taskset
+ * @tset: taskset of interest
+ *
+ * Return the next task in @tset. Iteration must have been initialized
+ * with cgroup_taskset_first().
+ */
+struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset)
+{
+ struct task_and_cgroup *tc;
+
+ if (!tset->tc_array || tset->idx >= tset->tc_array_len)
+ return NULL;
+
+ tc = flex_array_get(tset->tc_array, tset->idx++);
+ tset->cur_cgrp = tc->cgrp;
+ return tc->task;
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_next);
+
+/**
+ * cgroup_taskset_cur_cgroup - return the matching cgroup for the current task
+ * @tset: taskset of interest
+ *
+ * Return the cgroup for the current (last returned) task of @tset. This
+ * function must be preceded by either cgroup_taskset_first() or
+ * cgroup_taskset_next().
+ */
+struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset)
+{
+ return tset->cur_cgrp;
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_cur_cgroup);
+
+/**
+ * cgroup_taskset_size - return the number of tasks in taskset
+ * @tset: taskset of interest
+ */
+int cgroup_taskset_size(struct cgroup_taskset *tset)
+{
+ return tset->tc_array ? tset->tc_array_len : 1;
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_size);
+
+
/*
* cgroup_task_migrate - move a task from one cgroup to another.
*
@@ -1842,6 +1916,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
struct cgroup_subsys *ss, *failed_ss = NULL;
struct cgroup *oldcgrp;
struct cgroupfs_root *root = cgrp->root;
+ struct cgroup_taskset tset = { };
/* @tsk either already exited or can't exit until the end */
if (tsk->flags & PF_EXITING)
@@ -1852,9 +1927,12 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
if (cgrp == oldcgrp)
return 0;
+ tset.single.task = tsk;
+ tset.single.cgrp = oldcgrp;
+
for_each_subsys(root, ss) {
if (ss->can_attach) {
- retval = ss->can_attach(ss, cgrp, tsk);
+ retval = ss->can_attach(ss, cgrp, &tset);
if (retval) {
/*
* Remember on which subsystem the can_attach()
@@ -1885,7 +1963,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
if (ss->attach_task)
ss->attach_task(cgrp, tsk);
if (ss->attach)
- ss->attach(ss, cgrp, oldcgrp, tsk);
+ ss->attach(ss, cgrp, &tset);
}
synchronize_rcu();
@@ -1907,7 +1985,7 @@ out:
*/
break;
if (ss->cancel_attach)
- ss->cancel_attach(ss, cgrp, tsk);
+ ss->cancel_attach(ss, cgrp, &tset);
}
}
return retval;
@@ -2023,6 +2101,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
struct task_struct *tsk;
struct task_and_cgroup *tc;
struct flex_array *group;
+ struct cgroup_taskset tset = { };
/*
* we need to make sure we have css_sets for all the tasks we're
* going to move -before- we actually start moving them, so that in
@@ -2089,6 +2168,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
} while_each_thread(leader, tsk);
/* remember the number of threads in the array for later. */
group_size = i;
+ tset.tc_array = group;
+ tset.tc_array_len = group_size;
rcu_read_unlock();
/* methods shouldn't be called if no task is actually migrating */
@@ -2101,7 +2182,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
*/
for_each_subsys(root, ss) {
if (ss->can_attach) {
- retval = ss->can_attach(ss, cgrp, leader);
+ retval = ss->can_attach(ss, cgrp, &tset);
if (retval) {
failed_ss = ss;
goto out_cancel_attach;
@@ -2185,10 +2266,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
* being moved, this call will need to be reworked to communicate that.
*/
for_each_subsys(root, ss) {
- if (ss->attach) {
- tc = flex_array_get(group, 0);
- ss->attach(ss, cgrp, tc->cgrp, tc->task);
- }
+ if (ss->attach)
+ ss->attach(ss, cgrp, &tset);
}
/*
@@ -2210,11 +2289,11 @@ out_cancel_attach:
for_each_subsys(root, ss) {
if (ss == failed_ss) {
if (cancel_failed_ss && ss->cancel_attach)
- ss->cancel_attach(ss, cgrp, leader);
+ ss->cancel_attach(ss, cgrp, &tset);
break;
}
if (ss->cancel_attach)
- ss->cancel_attach(ss, cgrp, leader);
+ ss->cancel_attach(ss, cgrp, &tset);
}
}
out_put_tasks:
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e691818..a72d5fa 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -160,7 +160,7 @@ static void freezer_destroy(struct cgroup_subsys *ss,
*/
static int freezer_can_attach(struct cgroup_subsys *ss,
struct cgroup *new_cgroup,
- struct task_struct *task)
+ struct cgroup_taskset *tset)
{
struct freezer *freezer;
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 10131fd..2e5825b 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1368,10 +1368,10 @@ static int fmeter_getrate(struct fmeter *fmp)
}
/* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
-static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
- struct task_struct *tsk)
+static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_taskset *tset)
{
- struct cpuset *cs = cgroup_cs(cont);
+ struct cpuset *cs = cgroup_cs(cgrp);
if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
return -ENOSPC;
@@ -1384,7 +1384,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
* set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may
* be changed.
*/
- if (tsk->flags & PF_THREAD_BOUND)
+ if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND)
return -EINVAL;
return 0;
@@ -1434,12 +1434,14 @@ static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk)
cpuset_update_task_spread_flag(cs, tsk);
}
-static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
- struct cgroup *oldcont, struct task_struct *tsk)
+static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_taskset *tset)
{
struct mm_struct *mm;
- struct cpuset *cs = cgroup_cs(cont);
- struct cpuset *oldcs = cgroup_cs(oldcont);
+ struct task_struct *tsk = cgroup_taskset_first(tset);
+ struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset);
+ struct cpuset *cs = cgroup_cs(cgrp);
+ struct cpuset *oldcs = cgroup_cs(oldcgrp);
/*
* Change mm, possibly for multiple threads in a threadgroup. This is
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3508777..5019787 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5288,8 +5288,9 @@ static void mem_cgroup_clear_mc(void)
static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
struct cgroup *cgroup,
- struct task_struct *p)
+ struct cgroup_taskset *tset)
{
+ struct task_struct *p = cgroup_taskset_first(tset);
int ret = 0;
struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
@@ -5327,7 +5328,7 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
struct cgroup *cgroup,
- struct task_struct *p)
+ struct cgroup_taskset *tset)
{
mem_cgroup_clear_mc();
}
@@ -5444,9 +5445,9 @@ retry:
static void mem_cgroup_move_task(struct cgroup_subsys *ss,
struct cgroup *cont,
- struct cgroup *old_cont,
- struct task_struct *p)
+ struct cgroup_taskset *tset)
{
+ struct task_struct *p = cgroup_taskset_first(tset);
struct mm_struct *mm = get_task_mm(p);
if (mm) {
@@ -5461,19 +5462,18 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
#else /* !CONFIG_MMU */
static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
struct cgroup *cgroup,
- struct task_struct *p)
+ struct cgroup_taskset *tset)
{
return 0;
}
static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
struct cgroup *cgroup,
- struct task_struct *p)
+ struct cgroup_taskset *tset)
{
}
static void mem_cgroup_move_task(struct cgroup_subsys *ss,
struct cgroup *cont,
- struct cgroup *old_cont,
- struct task_struct *p)
+ struct cgroup_taskset *tset)
{
}
#endif
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 4450fbe..8b5b5d8 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -62,11 +62,12 @@ static inline struct dev_cgroup *task_devcgroup(struct task_struct *task)
struct cgroup_subsys devices_subsys;
static int devcgroup_can_attach(struct cgroup_subsys *ss,
- struct cgroup *new_cgroup, struct task_struct *task)
+ struct cgroup *new_cgrp, struct cgroup_taskset *set)
{
- if (current != task && !capable(CAP_SYS_ADMIN))
- return -EPERM;
+ struct task_struct *task = cgroup_taskset_first(set);
+ if (current != task && !capable(CAP_SYS_ADMIN))
+ return -EPERM;
return 0;
}
--
1.7.3.1
^ permalink raw reply related
* [PATCH 06/10] cgroup: improve old cgroup handling in cgroup_attach_proc()
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
cgroup_attach_proc() behaves differently from cgroup_attach_task() in
the following aspects.
* All hooks are invoked even if no task is actually being moved.
* ->can_attach_task() is called for all tasks in the group whether the
new cgrp is different from the current cgrp or not; however,
->attach_task() is skipped if new equals new. This makes the calls
asymmetric.
This patch improves old cgroup handling in cgroup_attach_proc() by
looking up the current cgroup at the head, recording it in the flex
array along with the task itself, and using it to remove the above two
differences. This will also ease further changes.
-v2: nr_todo renamed to nr_migrating_tasks as per Paul Menage's
suggestion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/cgroup.c | 66 +++++++++++++++++++++++++++++++++++--------------------
1 files changed, 42 insertions(+), 24 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3679fb6..19396d6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1757,6 +1757,11 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
}
EXPORT_SYMBOL_GPL(cgroup_path);
+struct task_and_cgroup {
+ struct task_struct *task;
+ struct cgroup *cgrp;
+};
+
/*
* cgroup_task_migrate - move a task from one cgroup to another.
*
@@ -2008,15 +2013,15 @@ static int css_set_prefetch(struct cgroup *cgrp, struct css_set *cg,
*/
int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
{
- int retval, i, group_size;
+ int retval, i, group_size, nr_migrating_tasks;
struct cgroup_subsys *ss, *failed_ss = NULL;
bool cancel_failed_ss = false;
/* guaranteed to be initialized later, but the compiler needs this */
- struct cgroup *oldcgrp = NULL;
struct css_set *oldcg;
struct cgroupfs_root *root = cgrp->root;
/* threadgroup list cursor and array */
struct task_struct *tsk;
+ struct task_and_cgroup *tc;
struct flex_array *group;
/*
* we need to make sure we have css_sets for all the tasks we're
@@ -2035,8 +2040,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
*/
group_size = get_nr_threads(leader);
/* flex_array supports very large thread-groups better than kmalloc. */
- group = flex_array_alloc(sizeof(struct task_struct *), group_size,
- GFP_KERNEL);
+ group = flex_array_alloc(sizeof(*tc), group_size, GFP_KERNEL);
if (!group)
return -ENOMEM;
/* pre-allocate to guarantee space while iterating in rcu read-side. */
@@ -2060,8 +2064,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
}
/* take a reference on each task in the group to go in the array. */
tsk = leader;
- i = 0;
+ i = nr_migrating_tasks = 0;
do {
+ struct task_and_cgroup ent;
+
/* @tsk either already exited or can't exit until the end */
if (tsk->flags & PF_EXITING)
continue;
@@ -2073,14 +2079,23 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
* saying GFP_ATOMIC has no effect here because we did prealloc
* earlier, but it's good form to communicate our expectations.
*/
- retval = flex_array_put_ptr(group, i, tsk, GFP_ATOMIC);
+ ent.task = tsk;
+ ent.cgrp = task_cgroup_from_root(tsk, root);
+ retval = flex_array_put(group, i, &ent, GFP_ATOMIC);
BUG_ON(retval != 0);
i++;
+ if (ent.cgrp != cgrp)
+ nr_migrating_tasks++;
} while_each_thread(leader, tsk);
/* remember the number of threads in the array for later. */
group_size = i;
rcu_read_unlock();
+ /* methods shouldn't be called if no task is actually migrating */
+ retval = 0;
+ if (!nr_migrating_tasks)
+ goto out_put_tasks;
+
/*
* step 1: check that we can legitimately attach to the cgroup.
*/
@@ -2096,8 +2111,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
if (ss->can_attach_task) {
/* run on each task in the threadgroup. */
for (i = 0; i < group_size; i++) {
- tsk = flex_array_get_ptr(group, i);
- retval = ss->can_attach_task(cgrp, tsk);
+ tc = flex_array_get(group, i);
+ if (tc->cgrp == cgrp)
+ continue;
+ retval = ss->can_attach_task(cgrp, tc->task);
if (retval) {
failed_ss = ss;
cancel_failed_ss = true;
@@ -2113,18 +2130,17 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
*/
INIT_LIST_HEAD(&newcg_list);
for (i = 0; i < group_size; i++) {
- tsk = flex_array_get_ptr(group, i);
+ tc = flex_array_get(group, i);
/* nothing to do if this task is already in the cgroup */
- oldcgrp = task_cgroup_from_root(tsk, root);
- if (cgrp == oldcgrp)
+ if (tc->cgrp == cgrp)
continue;
/* get old css_set pointer */
- task_lock(tsk);
- oldcg = tsk->cgroups;
+ task_lock(tc->task);
+ oldcg = tc->task->cgroups;
get_css_set(oldcg);
- task_unlock(tsk);
+ task_unlock(tc->task);
/* see if the new one for us is already in the list? */
- if (css_set_check_fetched(cgrp, tsk, oldcg, &newcg_list)) {
+ if (css_set_check_fetched(cgrp, tc->task, oldcg, &newcg_list)) {
/* was already there, nothing to do. */
put_css_set(oldcg);
} else {
@@ -2147,19 +2163,18 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
ss->pre_attach(cgrp);
}
for (i = 0; i < group_size; i++) {
- tsk = flex_array_get_ptr(group, i);
+ tc = flex_array_get(group, i);
/* leave current thread as it is if it's already there */
- oldcgrp = task_cgroup_from_root(tsk, root);
- if (cgrp == oldcgrp)
+ if (tc->cgrp == cgrp)
continue;
- retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
+ retval = cgroup_task_migrate(cgrp, tc->cgrp, tc->task, true);
BUG_ON(retval != 0);
/* attach each task to each subsystem */
for_each_subsys(root, ss) {
if (ss->attach_task)
- ss->attach_task(cgrp, tsk);
+ ss->attach_task(cgrp, tc->task);
}
}
/* nothing is sensitive to fork() after this point. */
@@ -2170,8 +2185,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
* being moved, this call will need to be reworked to communicate that.
*/
for_each_subsys(root, ss) {
- if (ss->attach)
- ss->attach(ss, cgrp, oldcgrp, leader);
+ if (ss->attach) {
+ tc = flex_array_get(group, 0);
+ ss->attach(ss, cgrp, tc->cgrp, tc->task);
+ }
}
/*
@@ -2200,10 +2217,11 @@ out_cancel_attach:
ss->cancel_attach(ss, cgrp, leader);
}
}
+out_put_tasks:
/* clean up the array of referenced threads in the group. */
for (i = 0; i < group_size; i++) {
- tsk = flex_array_get_ptr(group, i);
- put_task_struct(tsk);
+ tc = flex_array_get(group, i);
+ put_task_struct(tc->task);
}
out_free_group_list:
flex_array_free(group);
--
1.7.3.1
^ permalink raw reply related
* [PATCH 05/10] cgroup: subsys->attach_task() should be called after migration
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
cgroup_attach_task() calls subsys->attach_task() after
cgroup_task_migrate(); however, cgroup_attach_proc() calls it before
migration. This actually affects some of the users. Update
cgroup_attach_proc() such that ->attach_task() is called after
migration.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@Jp.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/cgroup.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 83e10f9..3679fb6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2152,13 +2152,15 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
oldcgrp = task_cgroup_from_root(tsk, root);
if (cgrp == oldcgrp)
continue;
+
+ retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
+ BUG_ON(retval != 0);
+
/* attach each task to each subsystem */
for_each_subsys(root, ss) {
if (ss->attach_task)
ss->attach_task(cgrp, tsk);
}
- retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
- BUG_ON(retval != 0);
}
/* nothing is sensitive to fork() after this point. */
--
1.7.3.1
^ permalink raw reply related
* [PATCH 04/10] cgroup: always lock threadgroup during migration
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
Update cgroup to take advantage of the fack that threadgroup_lock()
guarantees stable threadgroup.
* Lock threadgroup even if the target is a single task. This
guarantees that when the target tasks stay stable during migration
regardless of the target type.
* Remove PF_EXITING early exit optimization from attach_task_by_pid()
and check it in cgroup_task_migrate() instead. The optimization was
for rather cold path to begin with and PF_EXITING state can be
trusted throughout migration by checking it after locking
threadgroup.
* Don't add PF_EXITING tasks to target task array in
cgroup_attach_proc(). This ensures that task migration is performed
only for live tasks.
* Remove -ESRCH failure path from cgroup_task_migrate(). With the
above changes, it's guaranteed to be called only for live tasks.
After the changes, only live tasks are migrated and they're guaranteed
to stay alive until migration is complete. This removes problems
caused by exec and exit racing against cgroup migration including
symmetry among cgroup attach methods and different cgroup methods
racing each other.
v2: Oleg pointed out that one more PF_EXITING check can be removed
from cgroup_attach_proc(). Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/cgroup.c | 51 +++++++++++++++++++++++----------------------------
1 files changed, 23 insertions(+), 28 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f0e099f..83e10f9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1762,7 +1762,7 @@ EXPORT_SYMBOL_GPL(cgroup_path);
*
* 'guarantee' is set if the caller promises that a new css_set for the task
* will already exist. If not set, this function might sleep, and can fail with
- * -ENOMEM. Otherwise, it can only fail with -ESRCH.
+ * -ENOMEM. Must be called with cgroup_mutex and threadgroup locked.
*/
static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
struct task_struct *tsk, bool guarantee)
@@ -1800,13 +1800,9 @@ static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
}
put_css_set(oldcg);
- /* if PF_EXITING is set, the tsk->cgroups pointer is no longer safe. */
+ /* @tsk can't exit as its threadgroup is locked */
task_lock(tsk);
- if (tsk->flags & PF_EXITING) {
- task_unlock(tsk);
- put_css_set(newcg);
- return -ESRCH;
- }
+ WARN_ON_ONCE(tsk->flags & PF_EXITING);
rcu_assign_pointer(tsk->cgroups, newcg);
task_unlock(tsk);
@@ -1832,8 +1828,8 @@ static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
* @cgrp: the cgroup the task is attaching to
* @tsk: the task to be attached
*
- * Call holding cgroup_mutex. May take task_lock of
- * the task 'tsk' during call.
+ * Call with cgroup_mutex and threadgroup locked. May take task_lock of
+ * @tsk during call.
*/
int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
{
@@ -1842,6 +1838,10 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
struct cgroup *oldcgrp;
struct cgroupfs_root *root = cgrp->root;
+ /* @tsk either already exited or can't exit until the end */
+ if (tsk->flags & PF_EXITING)
+ return -ESRCH;
+
/* Nothing to do if the task is already in that cgroup */
oldcgrp = task_cgroup_from_root(tsk, root);
if (cgrp == oldcgrp)
@@ -2062,6 +2062,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
tsk = leader;
i = 0;
do {
+ /* @tsk either already exited or can't exit until the end */
+ if (tsk->flags & PF_EXITING)
+ continue;
+
/* as per above, nr_threads may decrease, but not increase. */
BUG_ON(i >= group_size);
get_task_struct(tsk);
@@ -2116,11 +2120,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
continue;
/* get old css_set pointer */
task_lock(tsk);
- if (tsk->flags & PF_EXITING) {
- /* ignore this task if it's going away */
- task_unlock(tsk);
- continue;
- }
oldcg = tsk->cgroups;
get_css_set(oldcg);
task_unlock(tsk);
@@ -2158,9 +2157,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
if (ss->attach_task)
ss->attach_task(cgrp, tsk);
}
- /* if the thread is PF_EXITING, it can just get skipped. */
retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
- BUG_ON(retval != 0 && retval != -ESRCH);
+ BUG_ON(retval != 0);
}
/* nothing is sensitive to fork() after this point. */
@@ -2212,8 +2210,8 @@ out_free_group_list:
/*
* Find the task_struct of the task to attach by vpid and pass it along to the
- * function to attach either it or all tasks in its threadgroup. Will take
- * cgroup_mutex; may take task_lock of task.
+ * function to attach either it or all tasks in its threadgroup. Will lock
+ * cgroup_mutex and threadgroup; may take task_lock of task.
*/
static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
{
@@ -2240,11 +2238,6 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
* detect it later.
*/
tsk = tsk->group_leader;
- } else if (tsk->flags & PF_EXITING) {
- /* optimization for the single-task-only case */
- rcu_read_unlock();
- cgroup_unlock();
- return -ESRCH;
}
/*
* even if we're attaching all tasks in the thread group, we
@@ -2268,13 +2261,15 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
get_task_struct(tsk);
}
- if (threadgroup) {
- threadgroup_lock(tsk);
+ threadgroup_lock(tsk);
+
+ if (threadgroup)
ret = cgroup_attach_proc(cgrp, tsk);
- threadgroup_unlock(tsk);
- } else {
+ else
ret = cgroup_attach_task(cgrp, tsk);
- }
+
+ threadgroup_unlock(tsk);
+
put_task_struct(tsk);
cgroup_unlock();
return ret;
--
1.7.3.1
^ permalink raw reply related
* [PATCH 03/10] threadgroup: extend threadgroup_lock() to cover exit and exec
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
threadgroup_lock() protected only protected against new addition to
the threadgroup, which was inherently somewhat incomplete and
problematic for its only user cgroup. On-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.
This patch extends threadgroup_lock() such that it protects against
all three threadgroup altering operations - fork, exit and exec. For
exit, threadgroup_change_begin/end() calls are added to exit path.
For exec, threadgroup_[un]lock() are updated to also grab and release
cred_guard_mutex.
With this change, threadgroup_lock() guarantees that the target
threadgroup will remain stable - no new task will be added, no new
PF_EXITING will be set and exec won't happen.
The next patch will update cgroup so that it can take full advantage
of this change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/sched.h | 36 ++++++++++++++++++++++++++++++------
kernel/exit.c | 17 +++++++++++++----
2 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa47d0f..6fb546e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -635,11 +635,12 @@ struct signal_struct {
#endif
#ifdef CONFIG_CGROUPS
/*
- * The group_rwsem prevents threads from forking with
- * CLONE_THREAD while held for writing. Use this for fork-sensitive
- * threadgroup-wide operations. It's taken for reading in fork.c in
- * copy_process().
- * Currently only needed write-side by cgroups.
+ * group_rwsem prevents new tasks from entering the threadgroup and
+ * member tasks from exiting. fork and exit paths are protected
+ * with this rwsem using threadgroup_change_begin/end(). Users
+ * which require threadgroup to remain stable should use
+ * threadgroup_[un]lock() which also takes care of exec path.
+ * Currently, cgroup is the only user.
*/
struct rw_semaphore group_rwsem;
#endif
@@ -2367,7 +2368,6 @@ static inline void unlock_task_sighand(struct task_struct *tsk,
spin_unlock_irqrestore(&tsk->sighand->siglock, *flags);
}
-/* See the declaration of group_rwsem in signal_struct. */
#ifdef CONFIG_CGROUPS
static inline void threadgroup_change_begin(struct task_struct *tsk)
{
@@ -2377,13 +2377,37 @@ static inline void threadgroup_change_done(struct task_struct *tsk)
{
up_read(&tsk->signal->group_rwsem);
}
+
+/**
+ * threadgroup_lock - lock threadgroup
+ * @tsk: member task of the threadgroup to lock
+ *
+ * Lock the threadgroup @tsk belongs to. No new task is allowed to enter
+ * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
+ * perform exec. This is useful for cases where the threadgroup needs to
+ * stay stable across blockable operations.
+ *
+ * fork and exit explicitly call threadgroup_change_{begin|end}() for
+ * synchronization. exec is already synchronized with cread_guard_mutex
+ * which we grab here.
+ */
static inline void threadgroup_lock(struct task_struct *tsk)
{
+ /* exec uses exit for de-threading, grab cred_guard_mutex first */
+ mutex_lock(&tsk->signal->cred_guard_mutex);
down_write(&tsk->signal->group_rwsem);
}
+
+/**
+ * threadgroup_unlock - unlock threadgroup
+ * @tsk: member task of the threadgroup to unlock
+ *
+ * Reverse threadgroup_lock().
+ */
static inline void threadgroup_unlock(struct task_struct *tsk)
{
up_write(&tsk->signal->group_rwsem);
+ mutex_unlock(&tsk->signal->cred_guard_mutex);
}
#else
static inline void threadgroup_change_begin(struct task_struct *tsk) {}
diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..3565f00 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -938,6 +938,12 @@ NORET_TYPE void do_exit(long code)
schedule();
}
+ /*
+ * @tsk's threadgroup is going through changes - lock out users
+ * which expect stable threadgroup.
+ */
+ threadgroup_change_begin(tsk);
+
exit_irq_thread();
exit_signals(tsk); /* sets PF_EXITING */
@@ -1020,10 +1026,6 @@ NORET_TYPE void do_exit(long code)
kfree(current->pi_state_cache);
#endif
/*
- * Make sure we are holding no locks:
- */
- debug_check_no_locks_held(tsk);
- /*
* We can do this unlocked here. The futex code uses this flag
* just to verify whether the pi state cleanup has been done
* or not. In the worst case it loops once more.
@@ -1040,6 +1042,13 @@ NORET_TYPE void do_exit(long code)
preempt_disable();
exit_rcu();
+
+ /*
+ * Release threadgroup and make sure we are holding no locks.
+ */
+ threadgroup_change_done(tsk);
+ debug_check_no_locks_held(tsk);
+
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
schedule();
--
1.7.3.1
^ permalink raw reply related
* [PATCH 02/10] threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
Make the following renames to prepare for extension of threadgroup
locking.
* s/signal->threadgroup_fork_lock/signal->group_rwsem/
* s/threadgroup_fork_read_lock()/threadgroup_change_begin()/
* s/threadgroup_fork_read_unlock()/threadgroup_change_done()/
* s/threadgroup_fork_write_lock()/threadgroup_lock()/
* s/threadgroup_fork_write_unlock()/threadgroup_unlock()/
This patch doesn't cause any behavior change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/init_task.h | 9 ++++-----
include/linux/sched.h | 30 +++++++++++++++---------------
kernel/cgroup.c | 13 ++++++-------
kernel/fork.c | 8 ++++----
4 files changed, 29 insertions(+), 31 deletions(-)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 08ffab0..ef20cbe 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -23,11 +23,10 @@ extern struct files_struct init_files;
extern struct fs_struct init_fs;
#ifdef CONFIG_CGROUPS
-#define INIT_THREADGROUP_FORK_LOCK(sig) \
- .threadgroup_fork_lock = \
- __RWSEM_INITIALIZER(sig.threadgroup_fork_lock),
+#define INIT_GROUP_RWSEM(sig) \
+ .group_rwsem = __RWSEM_INITIALIZER(sig.group_rwsem),
#else
-#define INIT_THREADGROUP_FORK_LOCK(sig)
+#define INIT_GROUP_RWSEM(sig)
#endif
#define INIT_SIGNALS(sig) { \
@@ -46,7 +45,7 @@ extern struct fs_struct init_fs;
}, \
.cred_guard_mutex = \
__MUTEX_INITIALIZER(sig.cred_guard_mutex), \
- INIT_THREADGROUP_FORK_LOCK(sig) \
+ INIT_GROUP_RWSEM(sig) \
}
extern struct nsproxy init_nsproxy;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e8acce7..aa47d0f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -635,13 +635,13 @@ struct signal_struct {
#endif
#ifdef CONFIG_CGROUPS
/*
- * The threadgroup_fork_lock prevents threads from forking with
+ * The group_rwsem prevents threads from forking with
* CLONE_THREAD while held for writing. Use this for fork-sensitive
* threadgroup-wide operations. It's taken for reading in fork.c in
* copy_process().
* Currently only needed write-side by cgroups.
*/
- struct rw_semaphore threadgroup_fork_lock;
+ struct rw_semaphore group_rwsem;
#endif
int oom_adj; /* OOM kill score adjustment (bit shift) */
@@ -2367,29 +2367,29 @@ static inline void unlock_task_sighand(struct task_struct *tsk,
spin_unlock_irqrestore(&tsk->sighand->siglock, *flags);
}
-/* See the declaration of threadgroup_fork_lock in signal_struct. */
+/* See the declaration of group_rwsem in signal_struct. */
#ifdef CONFIG_CGROUPS
-static inline void threadgroup_fork_read_lock(struct task_struct *tsk)
+static inline void threadgroup_change_begin(struct task_struct *tsk)
{
- down_read(&tsk->signal->threadgroup_fork_lock);
+ down_read(&tsk->signal->group_rwsem);
}
-static inline void threadgroup_fork_read_unlock(struct task_struct *tsk)
+static inline void threadgroup_change_done(struct task_struct *tsk)
{
- up_read(&tsk->signal->threadgroup_fork_lock);
+ up_read(&tsk->signal->group_rwsem);
}
-static inline void threadgroup_fork_write_lock(struct task_struct *tsk)
+static inline void threadgroup_lock(struct task_struct *tsk)
{
- down_write(&tsk->signal->threadgroup_fork_lock);
+ down_write(&tsk->signal->group_rwsem);
}
-static inline void threadgroup_fork_write_unlock(struct task_struct *tsk)
+static inline void threadgroup_unlock(struct task_struct *tsk)
{
- up_write(&tsk->signal->threadgroup_fork_lock);
+ up_write(&tsk->signal->group_rwsem);
}
#else
-static inline void threadgroup_fork_read_lock(struct task_struct *tsk) {}
-static inline void threadgroup_fork_read_unlock(struct task_struct *tsk) {}
-static inline void threadgroup_fork_write_lock(struct task_struct *tsk) {}
-static inline void threadgroup_fork_write_unlock(struct task_struct *tsk) {}
+static inline void threadgroup_change_begin(struct task_struct *tsk) {}
+static inline void threadgroup_change_done(struct task_struct *tsk) {}
+static inline void threadgroup_lock(struct task_struct *tsk) {}
+static inline void threadgroup_unlock(struct task_struct *tsk) {}
#endif
#ifndef __HAVE_THREAD_FUNCTIONS
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index efa5886..f0e099f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2003,8 +2003,8 @@ static int css_set_prefetch(struct cgroup *cgrp, struct css_set *cg,
* @cgrp: the cgroup to attach to
* @leader: the threadgroup leader task_struct of the group to be attached
*
- * Call holding cgroup_mutex and the threadgroup_fork_lock of the leader. Will
- * take task_lock of each thread in leader's threadgroup individually in turn.
+ * Call holding cgroup_mutex and the group_rwsem of the leader. Will take
+ * task_lock of each thread in leader's threadgroup individually in turn.
*/
int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
{
@@ -2030,8 +2030,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
* step 0: in order to do expensive, possibly blocking operations for
* every thread, we cannot iterate the thread group list, since it needs
* rcu or tasklist locked. instead, build an array of all threads in the
- * group - threadgroup_fork_lock prevents new threads from appearing,
- * and if threads exit, this will just be an over-estimate.
+ * group - group_rwsem prevents new threads from appearing, and if
+ * threads exit, this will just be an over-estimate.
*/
group_size = get_nr_threads(leader);
/* flex_array supports very large thread-groups better than kmalloc. */
@@ -2246,7 +2246,6 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
cgroup_unlock();
return -ESRCH;
}
-
/*
* even if we're attaching all tasks in the thread group, we
* only need to check permissions on one of them.
@@ -2270,9 +2269,9 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
}
if (threadgroup) {
- threadgroup_fork_write_lock(tsk);
+ threadgroup_lock(tsk);
ret = cgroup_attach_proc(cgrp, tsk);
- threadgroup_fork_write_unlock(tsk);
+ threadgroup_unlock(tsk);
} else {
ret = cgroup_attach_task(cgrp, tsk);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 8e6b6f4..c2af839 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -980,7 +980,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sched_autogroup_fork(sig);
#ifdef CONFIG_CGROUPS
- init_rwsem(&sig->threadgroup_fork_lock);
+ init_rwsem(&sig->group_rwsem);
#endif
sig->oom_adj = current->signal->oom_adj;
@@ -1166,7 +1166,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->io_context = NULL;
p->audit_context = NULL;
if (clone_flags & CLONE_THREAD)
- threadgroup_fork_read_lock(current);
+ threadgroup_change_begin(current);
cgroup_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
@@ -1378,7 +1378,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
proc_fork_connector(p);
cgroup_post_fork(p);
if (clone_flags & CLONE_THREAD)
- threadgroup_fork_read_unlock(current);
+ threadgroup_change_done(current);
perf_event_fork(p);
return p;
@@ -1418,7 +1418,7 @@ bad_fork_cleanup_policy:
bad_fork_cleanup_cgroup:
#endif
if (clone_flags & CLONE_THREAD)
- threadgroup_fork_read_unlock(current);
+ threadgroup_change_done(current);
cgroup_exit(p, cgroup_callbacks_done);
delayacct_tsk_free(p);
module_put(task_thread_info(p)->exec_domain->module);
--
1.7.3.1
^ permalink raw reply related
* [PATCH 01/10] cgroup: add cgroup_root_mutex
From: Tejun Heo @ 2011-11-01 23:46 UTC (permalink / raw)
To: paul, rjw, lizf
Cc: fweisbec, containers, linux-kernel, oleg, Tejun Heo, linux-pm,
akpm, kamezawa.hiroyu
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
cgroup wants to make threadgroup stable while modifying cgroup
hierarchies which will introduce locking dependency on
cred_guard_mutex from cgroup_mutex. This unfortunately completes
circular dependency.
A. cgroup_mutex -> cred_guard_mutex -> s_type->i_mutex_key -> namespace_sem
B. namespace_sem -> cgroup_mutex
B is from cgroup_show_options() and this patch breaks it by
introducing another mutex cgroup_root_mutex which nests inside
cgroup_mutex and protects cgroupfs_root.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/cgroup.c | 64 ++++++++++++++++++++++++++++++++++++-------------------
1 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 453100a..efa5886 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -63,7 +63,24 @@
#include <linux/atomic.h>
+/*
+ * cgroup_mutex is the master lock. Any modification to cgroup or its
+ * hierarchy must be performed while holding it.
+ *
+ * cgroup_root_mutex nests inside cgroup_mutex and should be held to modify
+ * cgroupfs_root of any cgroup hierarchy - subsys list, flags,
+ * release_agent_path and so on. Modifying requires both cgroup_mutex and
+ * cgroup_root_mutex. Readers can acquire either of the two. This is to
+ * break the following locking order cycle.
+ *
+ * A. cgroup_mutex -> cred_guard_mutex -> s_type->i_mutex_key -> namespace_sem
+ * B. namespace_sem -> cgroup_mutex
+ *
+ * B happens only through cgroup_show_options() and using cgroup_root_mutex
+ * breaks it.
+ */
static DEFINE_MUTEX(cgroup_mutex);
+static DEFINE_MUTEX(cgroup_root_mutex);
/*
* Generate an array of cgroup subsystem pointers. At boot time, this is
@@ -953,6 +970,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
int i;
BUG_ON(!mutex_is_locked(&cgroup_mutex));
+ BUG_ON(!mutex_is_locked(&cgroup_root_mutex));
removed_bits = root->actual_subsys_bits & ~final_bits;
added_bits = final_bits & ~root->actual_subsys_bits;
@@ -1043,7 +1061,7 @@ static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
struct cgroupfs_root *root = vfs->mnt_sb->s_fs_info;
struct cgroup_subsys *ss;
- mutex_lock(&cgroup_mutex);
+ mutex_lock(&cgroup_root_mutex);
for_each_subsys(root, ss)
seq_printf(seq, ",%s", ss->name);
if (test_bit(ROOT_NOPREFIX, &root->flags))
@@ -1054,7 +1072,7 @@ static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_puts(seq, ",clone_children");
if (strlen(root->name))
seq_printf(seq, ",name=%s", root->name);
- mutex_unlock(&cgroup_mutex);
+ mutex_unlock(&cgroup_root_mutex);
return 0;
}
@@ -1269,6 +1287,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
mutex_lock(&cgrp->dentry->d_inode->i_mutex);
mutex_lock(&cgroup_mutex);
+ mutex_lock(&cgroup_root_mutex);
/* See what subsystems are wanted */
ret = parse_cgroupfs_options(data, &opts);
@@ -1297,6 +1316,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
out_unlock:
kfree(opts.release_agent);
kfree(opts.name);
+ mutex_unlock(&cgroup_root_mutex);
mutex_unlock(&cgroup_mutex);
mutex_unlock(&cgrp->dentry->d_inode->i_mutex);
return ret;
@@ -1481,6 +1501,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
int ret = 0;
struct super_block *sb;
struct cgroupfs_root *new_root;
+ struct inode *inode;
/* First find the desired set of subsystems */
mutex_lock(&cgroup_mutex);
@@ -1514,7 +1535,6 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
/* We used the new root structure, so this is a new hierarchy */
struct list_head tmp_cg_links;
struct cgroup *root_cgrp = &root->top_cgroup;
- struct inode *inode;
struct cgroupfs_root *existing_root;
const struct cred *cred;
int i;
@@ -1528,18 +1548,14 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
mutex_lock(&inode->i_mutex);
mutex_lock(&cgroup_mutex);
+ mutex_lock(&cgroup_root_mutex);
- if (strlen(root->name)) {
- /* Check for name clashes with existing mounts */
- for_each_active_root(existing_root) {
- if (!strcmp(existing_root->name, root->name)) {
- ret = -EBUSY;
- mutex_unlock(&cgroup_mutex);
- mutex_unlock(&inode->i_mutex);
- goto drop_new_super;
- }
- }
- }
+ /* Check for name clashes with existing mounts */
+ ret = -EBUSY;
+ if (strlen(root->name))
+ for_each_active_root(existing_root)
+ if (!strcmp(existing_root->name, root->name))
+ goto unlock_drop;
/*
* We're accessing css_set_count without locking
@@ -1549,18 +1565,13 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
* have some link structures left over
*/
ret = allocate_cg_links(css_set_count, &tmp_cg_links);
- if (ret) {
- mutex_unlock(&cgroup_mutex);
- mutex_unlock(&inode->i_mutex);
- goto drop_new_super;
- }
+ if (ret)
+ goto unlock_drop;
ret = rebind_subsystems(root, root->subsys_bits);
if (ret == -EBUSY) {
- mutex_unlock(&cgroup_mutex);
- mutex_unlock(&inode->i_mutex);
free_cg_links(&tmp_cg_links);
- goto drop_new_super;
+ goto unlock_drop;
}
/*
* There must be no failure case after here, since rebinding
@@ -1599,6 +1610,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
cred = override_creds(&init_cred);
cgroup_populate_dir(root_cgrp);
revert_creds(cred);
+ mutex_unlock(&cgroup_root_mutex);
mutex_unlock(&cgroup_mutex);
mutex_unlock(&inode->i_mutex);
} else {
@@ -1615,6 +1627,10 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
kfree(opts.name);
return dget(sb->s_root);
+ unlock_drop:
+ mutex_unlock(&cgroup_root_mutex);
+ mutex_unlock(&cgroup_mutex);
+ mutex_unlock(&inode->i_mutex);
drop_new_super:
deactivate_locked_super(sb);
drop_modules:
@@ -1639,6 +1655,7 @@ static void cgroup_kill_sb(struct super_block *sb) {
BUG_ON(!list_empty(&cgrp->sibling));
mutex_lock(&cgroup_mutex);
+ mutex_lock(&cgroup_root_mutex);
/* Rebind all subsystems back to the default hierarchy */
ret = rebind_subsystems(root, 0);
@@ -1664,6 +1681,7 @@ static void cgroup_kill_sb(struct super_block *sb) {
root_count--;
}
+ mutex_unlock(&cgroup_root_mutex);
mutex_unlock(&cgroup_mutex);
kill_litter_super(sb);
@@ -2308,7 +2326,9 @@ static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
return -EINVAL;
if (!cgroup_lock_live_group(cgrp))
return -ENODEV;
+ mutex_lock(&cgroup_root_mutex);
strcpy(cgrp->root->release_agent_path, buffer);
+ mutex_unlock(&cgroup_root_mutex);
cgroup_unlock();
return 0;
}
--
1.7.3.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox