* Re: [PATCH v13 0/7] cgroup-aware OOM killer [not found] <20171130152824.1591-1-guro@fb.com> @ 2018-06-05 11:47 ` Michal Hocko 2018-06-05 12:13 ` Michal Hocko 2018-07-13 21:59 ` David Rientjes 0 siblings, 2 replies; 13+ messages in thread From: Michal Hocko @ 2018-06-05 11:47 UTC (permalink / raw) To: Roman Gushchin Cc: linux-mm, Vladimir Davydov, Johannes Weiner, Tetsuo Handa, David Rientjes, Andrew Morton, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm It seems that this is still in limbo mostly because of David's concerns. So let me reiterate them and provide my POV once more (and the last time) just to help Andrew make a decision: 1) comparision root with tail memcgs during the OOM killer is not fair because we are comparing tasks with memcgs. This is true, but I do not think this matters much for workloads which are going to use the feature. Why? Because the main consumers of the new feature seem to be containers which really need some fairness when comparing _workloads_ rather than processes. Those are unlikely to contain any significant memory consumers in the root memcg. That would be mostly common infrastructure. Is this is fixable? Yes, we would need to account in the root memcgs. Why are we not doing that now? Because it has some negligible performance overhead. Are there other ways? Yes we can approximate root memcg memory consumption but I would rather wait for somebody seeing that as a real problem rather than add hacks now without a strong reason. 2) Evading the oom killer by attaching processes to child cgroups which basically means that a task can split up the workload into smaller memcgs to hide their real memory consumption. Again true but not really anything new. Processes can already fork and split up the memory consumption. Moreover it doesn't even require any special privileges to do so unlike creating a sub memcg. Is this fixable? Yes, untrusted workloads can setup group oom evaluation at the delegation layer so all subgroups would be considered together. 3) Userspace has zero control over oom kill selection in leaf mem cgroups. Again true but this is something that needs a good evaluation to not end up in the fiasko we have seen with oom_score*. Current users demanding this feature can live without any prioritization so blocking the whole feature seems unreasonable. 4) Future extensibility to be backward compatible. David is wrong here IMHO. Any prioritization or oom selection policy controls added in future are orthogonal to the oom_group concept added by this patchset. Allowing memcg to be an oom entity is something that we really want longterm. Global CGRP_GROUP_OOM is the most restrictive semantic and softening it will be possible by a adding a new knob to tell whether a memcg/hierarchy is a workload or a set of tasks. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-06-05 11:47 ` [PATCH v13 0/7] cgroup-aware OOM killer Michal Hocko @ 2018-06-05 12:13 ` Michal Hocko 2018-07-13 21:59 ` David Rientjes 1 sibling, 0 replies; 13+ messages in thread From: Michal Hocko @ 2018-06-05 12:13 UTC (permalink / raw) To: Roman Gushchin Cc: linux-mm, Vladimir Davydov, Johannes Weiner, Tetsuo Handa, David Rientjes, Andrew Morton, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On Tue 05-06-18 13:47:29, Michal Hocko wrote: > It seems that this is still in limbo mostly because of David's concerns. > So let me reiterate them and provide my POV once more (and the last > time) just to help Andrew make a decision: Sorry, I forgot to add reference to the email with the full David's reasoning. Here it is http://lkml.kernel.org/r/alpine.DEB.2.10.1801091556490.173445@chino.kir.corp.google.com > 1) comparision root with tail memcgs during the OOM killer is not fair > because we are comparing tasks with memcgs. > > This is true, but I do not think this matters much for workloads which > are going to use the feature. Why? Because the main consumers of the new > feature seem to be containers which really need some fairness when > comparing _workloads_ rather than processes. Those are unlikely to > contain any significant memory consumers in the root memcg. That would > be mostly common infrastructure. > > Is this is fixable? Yes, we would need to account in the root memcgs. > Why are we not doing that now? Because it has some negligible > performance overhead. Are there other ways? Yes we can approximate root > memcg memory consumption but I would rather wait for somebody seeing > that as a real problem rather than add hacks now without a strong > reason. > > > 2) Evading the oom killer by attaching processes to child cgroups which > basically means that a task can split up the workload into smaller > memcgs to hide their real memory consumption. > > Again true but not really anything new. Processes can already fork and > split up the memory consumption. Moreover it doesn't even require any > special privileges to do so unlike creating a sub memcg. Is this > fixable? Yes, untrusted workloads can setup group oom evaluation at the > delegation layer so all subgroups would be considered together. > > 3) Userspace has zero control over oom kill selection in leaf mem > cgroups. > > Again true but this is something that needs a good evaluation to not end > up in the fiasko we have seen with oom_score*. Current users demanding > this feature can live without any prioritization so blocking the whole > feature seems unreasonable. > > 4) Future extensibility to be backward compatible. > > David is wrong here IMHO. Any prioritization or oom selection policy > controls added in future are orthogonal to the oom_group concept added > by this patchset. Allowing memcg to be an oom entity is something that > we really want longterm. Global CGRP_GROUP_OOM is the most restrictive > semantic and softening it will be possible by a adding a new knob to > tell whether a memcg/hierarchy is a workload or a set of tasks. > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-06-05 11:47 ` [PATCH v13 0/7] cgroup-aware OOM killer Michal Hocko 2018-06-05 12:13 ` Michal Hocko @ 2018-07-13 21:59 ` David Rientjes 2018-07-14 1:55 ` Tetsuo Handa 2018-07-16 9:36 ` Michal Hocko 1 sibling, 2 replies; 13+ messages in thread From: David Rientjes @ 2018-07-13 21:59 UTC (permalink / raw) To: Michal Hocko Cc: Roman Gushchin, linux-mm, Vladimir Davydov, Johannes Weiner, Tetsuo Handa, Andrew Morton, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On Tue, 5 Jun 2018, Michal Hocko wrote: > 1) comparision root with tail memcgs during the OOM killer is not fair > because we are comparing tasks with memcgs. > > This is true, but I do not think this matters much for workloads which > are going to use the feature. Why? Because the main consumers of the new > feature seem to be containers which really need some fairness when > comparing _workloads_ rather than processes. Those are unlikely to > contain any significant memory consumers in the root memcg. That would > be mostly common infrastructure. > There are users (us) who want to use the feature and not all processes are attached to leaf mem cgroups. The functionality can be provided in a generally useful way that doesn't require any specific hierarchy, and I implemented this in my patch series at https://marc.info/?l=linux-mm&m=152175563004458&w=2. That proposal to fix *all* of my concerns with the cgroup-aware oom killer as it sits in -mm, in it's entirety, only extends it so it is generally useful and does not remove any functionality. I'm not sure why we are discussing ways of moving forward when that patchset has been waiting for review for almost four months and, to date, I haven't seen an objection to. I don't know why we cannot agree on making solutions generally useful nor why that patchset has not been merged into -mm so that the whole feature can be merged. It's baffling. This is the first time I've encountered a perceived stalemate when there is a patchset sitting, unreviewed, that fixes all of the concerns that there are about the implementation sitting in -mm. This isn't a criticism just of comparing processes attached to root differently than leaf mem cgroups, it's how oom_score_adj influences that decision. It's trivial for a very small consumer (not "significant" memory consumer, as you put it) to require an oom kill from root instead of a leaf mem cgroup. I show in https://marc.info/?l=linux-mm&m=152175564104468&w=2 that changing the oom_score_adj of my bash shell attached to the root mem cgroup is considered equal to a 95GB leaf mem cgroup with the current implementation. > Is this is fixable? Yes, we would need to account in the root memcgs. > Why are we not doing that now? Because it has some negligible > performance overhead. Are there other ways? Yes we can approximate root > memcg memory consumption but I would rather wait for somebody seeing > that as a real problem rather than add hacks now without a strong > reason. > I fixed this in https://marc.info/?t=152175564500007&r=1&w=2, and from what I remmeber Roman actually liked it. > 2) Evading the oom killer by attaching processes to child cgroups which > basically means that a task can split up the workload into smaller > memcgs to hide their real memory consumption. > > Again true but not really anything new. Processes can already fork and > split up the memory consumption. Moreover it doesn't even require any > special privileges to do so unlike creating a sub memcg. Is this > fixable? Yes, untrusted workloads can setup group oom evaluation at the > delegation layer so all subgroups would be considered together. > Processes being able to fork to split up memory consumption is also fixed by https://marc.info/?l=linux-mm&m=152175564104467 just as creating subcontainers to intentionally or unintentionally subverting the oom policy is fixed. It solves both problems. > 3) Userspace has zero control over oom kill selection in leaf mem > cgroups. > > Again true but this is something that needs a good evaluation to not end > up in the fiasko we have seen with oom_score*. Current users demanding > this feature can live without any prioritization so blocking the whole > feature seems unreasonable. > One objection here is how the oom_score_adj of a process means something or doesn't mean something depending on what cgroup it is attached to. The cgroup-aware oom killer is cgroup aware. oom_score_adj should play no part. I fixed this with https://marc.info/?t=152175564500007&r=1&w=2. The other objection is that users do have cgroups that shouldn't be oom killed because they are important, either because they are required to provide a service for a smaller cgroup or because of business goals. We have cgroups that use more than half of system memory and they are allowed to do so because they are important. We would love to be able to bias against that cgroup to prefer others, or prefer cgroups for oom kill because they are less important. This was done for processes with oom_score_adj, we need it for a cgroup aware oom killer for the same reason. But notice even in https://marc.info/?l=linux-mm&m=152175563004458&w=2 that I said priority or adjustment can be added on top of the feature after it's merged. This itself is not precluding anything from being merged. > 4) Future extensibility to be backward compatible. > > David is wrong here IMHO. Any prioritization or oom selection policy > controls added in future are orthogonal to the oom_group concept added > by this patchset. Allowing memcg to be an oom entity is something that > we really want longterm. Global CGRP_GROUP_OOM is the most restrictive > semantic and softening it will be possible by a adding a new knob to > tell whether a memcg/hierarchy is a workload or a set of tasks. I've always said that the mechanism and policy in this patchset should be separated. I do that exact thing in https://marc.info/?l=linux-mm&m=152175564304469&w=2. I suggest that different subtrees will want (or the admin will require) different behaviors with regard to the mechanism. I've stated the problems (and there are others wrt mempolicy selection) that the current implementation has and given a full solution at https://marc.info/?l=linux-mm&m=152175563004458&w=2 that has not been reviewed. I would love feedback from anybody on this thread on that. I'm not trying to preclude the cgroup-aware oom killer from being merged, I'm the only person actively trying to get it merged. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-13 21:59 ` David Rientjes @ 2018-07-14 1:55 ` Tetsuo Handa 2018-07-16 21:13 ` Tetsuo Handa 2018-07-16 9:36 ` Michal Hocko 1 sibling, 1 reply; 13+ messages in thread From: Tetsuo Handa @ 2018-07-14 1:55 UTC (permalink / raw) To: David Rientjes, Michal Hocko Cc: Roman Gushchin, linux-mm, Vladimir Davydov, Johannes Weiner, Andrew Morton, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On 2018/07/14 6:59, David Rientjes wrote: > I'm not trying to preclude the cgroup-aware oom killer from being merged, > I'm the only person actively trying to get it merged. Before merging the cgroup-aware oom killer, can we merge OOM lockup fixes and my cleanup? The gap between linux.git and linux-next.git keeps us unable to use agreed baseline. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-14 1:55 ` Tetsuo Handa @ 2018-07-16 21:13 ` Tetsuo Handa 2018-07-16 22:09 ` Roman Gushchin 0 siblings, 1 reply; 13+ messages in thread From: Tetsuo Handa @ 2018-07-16 21:13 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Michal Hocko, Roman Gushchin, linux-mm, Vladimir Davydov, Johannes Weiner, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm No response from Roman and David... Andrew, will you once drop Roman's cgroup-aware OOM killer and David's patches? Roman's series has a bug which I mentioned and which can be avoided by my patch. David's patch is using MMF_UNSTABLE incorrectly such that it might start selecting next OOM victim without trying to reclaim any memory. Since they are not responding to my mail, I suggest once dropping from linux-next. https://www.spinics.net/lists/linux-mm/msg153212.html https://lore.kernel.org/lkml/201807130620.w6D6KiAJ093010@www262.sakura.ne.jp/T/#u On 2018/07/14 10:55, Tetsuo Handa wrote: > On 2018/07/14 6:59, David Rientjes wrote: >> I'm not trying to preclude the cgroup-aware oom killer from being merged, >> I'm the only person actively trying to get it merged. > > Before merging the cgroup-aware oom killer, can we merge OOM lockup fixes > and my cleanup? The gap between linux.git and linux-next.git keeps us unable > to use agreed baseline. > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-16 21:13 ` Tetsuo Handa @ 2018-07-16 22:09 ` Roman Gushchin 2018-07-17 0:55 ` Tetsuo Handa 0 siblings, 1 reply; 13+ messages in thread From: Roman Gushchin @ 2018-07-16 22:09 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, David Rientjes, Michal Hocko, linux-mm, Vladimir Davydov, Johannes Weiner, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On Tue, Jul 17, 2018 at 06:13:47AM +0900, Tetsuo Handa wrote: > No response from Roman and David... > > Andrew, will you once drop Roman's cgroup-aware OOM killer and David's patches? > Roman's series has a bug which I mentioned and which can be avoided by my patch. > David's patch is using MMF_UNSTABLE incorrectly such that it might start selecting > next OOM victim without trying to reclaim any memory. > > Since they are not responding to my mail, I suggest once dropping from linux-next. I was in cc, and didn't thought that you're expecting something from me. I don't get, why it's necessary to drop the cgroup oom killer to merge your fix? I'm happy to help with rebasing and everything else. Thanks, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-16 22:09 ` Roman Gushchin @ 2018-07-17 0:55 ` Tetsuo Handa 2018-07-31 14:14 ` Tetsuo Handa 0 siblings, 1 reply; 13+ messages in thread From: Tetsuo Handa @ 2018-07-17 0:55 UTC (permalink / raw) To: Roman Gushchin Cc: Andrew Morton, David Rientjes, Michal Hocko, linux-mm, Vladimir Davydov, Johannes Weiner, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm Roman Gushchin wrote: > On Tue, Jul 17, 2018 at 06:13:47AM +0900, Tetsuo Handa wrote: > > No response from Roman and David... > > > > Andrew, will you once drop Roman's cgroup-aware OOM killer and David's patches? > > Roman's series has a bug which I mentioned and which can be avoided by my patch. > > David's patch is using MMF_UNSTABLE incorrectly such that it might start selecting > > next OOM victim without trying to reclaim any memory. > > > > Since they are not responding to my mail, I suggest once dropping from linux-next. > > I was in cc, and didn't thought that you're expecting something from me. Oops. I was waiting for your response. ;-) But Roman, my patch conflicts with your "mm, oom: cgroup-aware OOM killer" patch in linux-next. And it seems to me that your patch contains a bug which leads to premature memory allocation failure explained below. Can we apply my patch prior to your "mm, oom: cgroup-aware OOM killer" patch (which eliminates "delay" and "out:" from your patch) so that people can easily backport my patch? Or, do you want to apply a fix (which eliminates "delay" and "out:" from linux-next) prior to my patch? > > I don't get, why it's necessary to drop the cgroup oom killer to merge your fix? > I'm happy to help with rebasing and everything else. Yes, I wish you rebase your series on top of OOM lockup (CVE-2016-10723) mitigation patch ( https://marc.info/?l=linux-mm&m=153112243424285&w=4 ). It is a trivial change and easy to cleanly backport (if applied before your series). Also, I expect you to check whether my cleanup patch which removes "abort" path ( [PATCH 1/2] at https://marc.info/?l=linux-mm&m=153119509215026&w=4 ) helps simplifying your series. I don't know detailed behavior of your series, but I assume that your series do not kill threads which current thread should not wait for MMF_OOM_SKIP. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-17 0:55 ` Tetsuo Handa @ 2018-07-31 14:14 ` Tetsuo Handa 2018-08-01 16:37 ` Roman Gushchin 0 siblings, 1 reply; 13+ messages in thread From: Tetsuo Handa @ 2018-07-31 14:14 UTC (permalink / raw) To: Roman Gushchin, Andrew Morton Cc: David Rientjes, Michal Hocko, linux-mm, Vladimir Davydov, Johannes Weiner, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On 2018/07/17 9:55, Tetsuo Handa wrote: >> I don't get, why it's necessary to drop the cgroup oom killer to merge your fix? >> I'm happy to help with rebasing and everything else. > > Yes, I wish you rebase your series on top of OOM lockup (CVE-2016-10723) mitigation > patch ( https://marc.info/?l=linux-mm&m=153112243424285&w=4 ). It is a trivial change > and easy to cleanly backport (if applied before your series). > > Also, I expect you to check whether my cleanup patch which removes "abort" path > ( [PATCH 1/2] at https://marc.info/?l=linux-mm&m=153119509215026&w=4 ) helps > simplifying your series. I don't know detailed behavior of your series, but I > assume that your series do not kill threads which current thread should not wait > for MMF_OOM_SKIP. syzbot is hitting WARN(1) due to mem_cgroup_out_of_memory() == false. https://syzkaller.appspot.com/bug?id=ea8c7912757d253537375e981b61749b2da69258 I can't tell what change is triggering this race. Maybe removal of oom_lock from the oom reaper made more likely to hit. But anyway I suspect that static bool oom_kill_memcg_victim(struct oom_control *oc) { if (oc->chosen_memcg == NULL || oc->chosen_memcg == INFLIGHT_VICTIM) return oc->chosen_memcg; // <= This line is still broken because /* We have one or more terminating processes at this point. */ oc->chosen_task = INFLIGHT_VICTIM; is not called. Also, that patch is causing confusion by reviving schedule_timeout_killable(1) with oom_lock held. Can we temporarily drop cgroup-aware OOM killer from linux-next.git and apply my cleanup patch? Since the merge window is approaching, I really want to see how next -rc1 would look like... -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-31 14:14 ` Tetsuo Handa @ 2018-08-01 16:37 ` Roman Gushchin 2018-08-01 22:01 ` Tetsuo Handa 0 siblings, 1 reply; 13+ messages in thread From: Roman Gushchin @ 2018-08-01 16:37 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, David Rientjes, Michal Hocko, linux-mm, Vladimir Davydov, Johannes Weiner, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On Tue, Jul 31, 2018 at 11:14:01PM +0900, Tetsuo Handa wrote: > On 2018/07/17 9:55, Tetsuo Handa wrote: > >> I don't get, why it's necessary to drop the cgroup oom killer to merge your fix? > >> I'm happy to help with rebasing and everything else. > > > > Yes, I wish you rebase your series on top of OOM lockup (CVE-2016-10723) mitigation > > patch ( https://marc.info/?l=linux-mm&m=153112243424285&w=4 ). It is a trivial change > > and easy to cleanly backport (if applied before your series). > > > > Also, I expect you to check whether my cleanup patch which removes "abort" path > > ( [PATCH 1/2] at https://marc.info/?l=linux-mm&m=153119509215026&w=4 ) helps > > simplifying your series. I don't know detailed behavior of your series, but I > > assume that your series do not kill threads which current thread should not wait > > for MMF_OOM_SKIP. > > syzbot is hitting WARN(1) due to mem_cgroup_out_of_memory() == false. > https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_bug-3Fid-3Dea8c7912757d253537375e981b61749b2da69258&d=DwICJg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=h9FJRAWtCmDLT-cVwvXKCYIUVRSrD--0XFJE-OnNY64&s=If6eFu6MlYjnfLXeg5_S-3tuhCZhSMv8_qfSrMfwOQ0&e= > > I can't tell what change is triggering this race. Maybe removal of oom_lock from > the oom reaper made more likely to hit. But anyway I suspect that > > static bool oom_kill_memcg_victim(struct oom_control *oc) > { > if (oc->chosen_memcg == NULL || oc->chosen_memcg == INFLIGHT_VICTIM) > return oc->chosen_memcg; // <= This line is still broken > > because > > /* We have one or more terminating processes at this point. */ > oc->chosen_task = INFLIGHT_VICTIM; > > is not called. > > Also, that patch is causing confusion by reviving schedule_timeout_killable(1) > with oom_lock held. > > Can we temporarily drop cgroup-aware OOM killer from linux-next.git and > apply my cleanup patch? Since the merge window is approaching, I really want to > see how next -rc1 would look like... Hi Tetsuo! Has this cleanup patch been acked by somebody? Which problem does it solve? Dropping patches for making a cleanup (if it's a cleanup) sounds a bit strange. Anyway, there is a good chance that current cgroup-aware OOM killer implementation will be replaced by a lightweight version (memory.oom.group). Please, take a look at it, probably your cleanup will not conflict with it at all. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-08-01 16:37 ` Roman Gushchin @ 2018-08-01 22:01 ` Tetsuo Handa 2018-08-01 22:55 ` Roman Gushchin 0 siblings, 1 reply; 13+ messages in thread From: Tetsuo Handa @ 2018-08-01 22:01 UTC (permalink / raw) To: Roman Gushchin, Andrew Morton Cc: David Rientjes, Michal Hocko, Vladimir Davydov, Johannes Weiner, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On 2018/08/02 1:37, Roman Gushchin wrote: > On Tue, Jul 31, 2018 at 11:14:01PM +0900, Tetsuo Handa wrote: >> Can we temporarily drop cgroup-aware OOM killer from linux-next.git and >> apply my cleanup patch? Since the merge window is approaching, I really want to >> see how next -rc1 would look like... > > Hi Tetsuo! > > Has this cleanup patch been acked by somebody? Not yet. But since Michal considers this cleanup as "a nice shortcut" ( https://marc.info/?i=20180607112836.GN32433@dhcp22.suse.cz ), I assume that I will get an ACK regarding this cleanup. > Which problem does it solve? It simplifies tricky out_of_memory() return value decision, and it also fixes a bug in your series which syzbot is pointing out. > Dropping patches for making a cleanup (if it's a cleanup) sounds a bit strange. What I need is a git tree which I can use as a baseline for making this cleanup. linux.git is not suitable because it does not include Michal's fix, but linux-next.git is not suitable because Michal's fix is overwritten by your series. I want a git tree which includes Michal's fix and does not include your series. > > Anyway, there is a good chance that current cgroup-aware OOM killer > implementation will be replaced by a lightweight version (memory.oom.group). > Please, take a look at it, probably your cleanup will not conflict with it > at all. Then, please drop current cgroup-aware OOM killer implementation from linux-next.git . I want to see how next -rc1 would look like (for testing purpose) and want to use linux-next.git as a baseline (for making this cleanup). > > Thanks! > > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-08-01 22:01 ` Tetsuo Handa @ 2018-08-01 22:55 ` Roman Gushchin 0 siblings, 0 replies; 13+ messages in thread From: Roman Gushchin @ 2018-08-01 22:55 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, David Rientjes, Michal Hocko, Vladimir Davydov, Johannes Weiner, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On Thu, Aug 02, 2018 at 07:01:28AM +0900, Tetsuo Handa wrote: > On 2018/08/02 1:37, Roman Gushchin wrote: > > On Tue, Jul 31, 2018 at 11:14:01PM +0900, Tetsuo Handa wrote: > >> Can we temporarily drop cgroup-aware OOM killer from linux-next.git and > >> apply my cleanup patch? Since the merge window is approaching, I really want to > >> see how next -rc1 would look like... > > > > Hi Tetsuo! > > > > Has this cleanup patch been acked by somebody? > > Not yet. But since Michal considers this cleanup as "a nice shortcut" > ( https://marc.info/?i=20180607112836.GN32433@dhcp22.suse.cz ), I assume that > I will get an ACK regarding this cleanup. > > > Which problem does it solve? > > It simplifies tricky out_of_memory() return value decision, and > it also fixes a bug in your series which syzbot is pointing out. > > > Dropping patches for making a cleanup (if it's a cleanup) sounds a bit strange. > > What I need is a git tree which I can use as a baseline for making this cleanup. > linux.git is not suitable because it does not include Michal's fix, but > linux-next.git is not suitable because Michal's fix is overwritten by your series. > I want a git tree which includes Michal's fix and does not include your series. > > > > > Anyway, there is a good chance that current cgroup-aware OOM killer > > implementation will be replaced by a lightweight version (memory.oom.group). > > Please, take a look at it, probably your cleanup will not conflict with it > > at all. > > Then, please drop current cgroup-aware OOM killer implementation from linux-next.git . > I want to see how next -rc1 would look like (for testing purpose) and want to use > linux-next.git as a baseline (for making this cleanup). I'll post memory.oom.group v2 later today, and if there will be no objections, I'll ask Andrew to drop current memcg-aware OOM killer and replace it with lightweight memory.oom.group. These changes will be picked by linux-next in few days. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-13 21:59 ` David Rientjes 2018-07-14 1:55 ` Tetsuo Handa @ 2018-07-16 9:36 ` Michal Hocko 2018-07-17 3:59 ` David Rientjes 1 sibling, 1 reply; 13+ messages in thread From: Michal Hocko @ 2018-07-16 9:36 UTC (permalink / raw) To: David Rientjes Cc: Roman Gushchin, linux-mm, Vladimir Davydov, Johannes Weiner, Tetsuo Handa, Andrew Morton, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On Fri 13-07-18 14:59:59, David Rientjes wrote: > On Tue, 5 Jun 2018, Michal Hocko wrote: > > > 1) comparision root with tail memcgs during the OOM killer is not fair > > because we are comparing tasks with memcgs. > > > > This is true, but I do not think this matters much for workloads which > > are going to use the feature. Why? Because the main consumers of the new > > feature seem to be containers which really need some fairness when > > comparing _workloads_ rather than processes. Those are unlikely to > > contain any significant memory consumers in the root memcg. That would > > be mostly common infrastructure. > > > > There are users (us) who want to use the feature and not all processes are > attached to leaf mem cgroups. The functionality can be provided in a > generally useful way that doesn't require any specific hierarchy, and I > implemented this in my patch series at > https://marc.info/?l=linux-mm&m=152175563004458&w=2. That proposal to fix > *all* of my concerns with the cgroup-aware oom killer as it sits in -mm, > in it's entirety, only extends it so it is generally useful and does not > remove any functionality. I'm not sure why we are discussing ways of > moving forward when that patchset has been waiting for review for almost > four months and, to date, I haven't seen an objection to. Well, I didn't really get to your patches yet. The last time I've checked I had some pretty serious concerns about the consistency of your proposal. Those might have been fixed in the lastest version of your patchset I haven't seen. But I still strongly suspect that you are largerly underestimating the complexity of more generic oom policies which you are heading to. Considering user API failures from the past (oom_*adj fiasco for example) suggests that we should start with smaller steps and only provide a clear and simple API. oom_group is such a simple and semantically consistent thing which is the reason I am OK with it much more than your "we can be more generic" approach. I simply do not trust we can agree on sane and consistent api in a reasonable time. And it is quite mind boggling that a simpler approach has been basically blocked for months because there are some concerns for workloads which are not really asking for the feature. Sure your usecase might need to handle root memcg differently. That is a fair point but that shouldn't really block containers users who can use the proposed solution without any further changes. If we ever decide to handle root memcg differently we are free to do so because the oom selection policy is not carved in stone by any api. [...] -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v13 0/7] cgroup-aware OOM killer 2018-07-16 9:36 ` Michal Hocko @ 2018-07-17 3:59 ` David Rientjes 0 siblings, 0 replies; 13+ messages in thread From: David Rientjes @ 2018-07-17 3:59 UTC (permalink / raw) To: Michal Hocko Cc: Roman Gushchin, linux-mm, Vladimir Davydov, Johannes Weiner, Tetsuo Handa, Andrew Morton, Tejun Heo, kernel-team, cgroups, linux-doc, linux-kernel, linux-mm On Mon, 16 Jul 2018, Michal Hocko wrote: > Well, I didn't really get to your patches yet. The last time I've > checked I had some pretty serious concerns about the consistency of your > proposal. Those might have been fixed in the lastest version of your > patchset I haven't seen. But I still strongly suspect that you are > largerly underestimating the complexity of more generic oom policies > which you are heading to. > I don't believe it's underestimated since it's used. It's perfectly valid the lock an entire hierarchy or individual subtrees into a single policy if that's what is preferred. Any use of a different policy at a subtree root is a conscious decision made by the owner of that subtree. If they prefer to kill the largest process, the largest descendant cgroup, or the largest subtree, it is up to them. All three have valid usecases, the goal is not to lock the entire hierarchy into a single policy: this introduces the ability for users to subvert the selection policy either intentionally or unintentionally because they are using a unified single hierarchy with cgroup v2 and they are using controllers other than mem cgroup. > Considering user API failures from the past (oom_*adj fiasco for > example) suggests that we should start with smaller steps and only > provide a clear and simple API. oom_group is such a simple and > semantically consistent thing which is the reason I am OK with it much > more than your "we can be more generic" approach. I simply do not trust > we can agree on sane and consistent api in a reasonable time. > > And it is quite mind boggling that a simpler approach has been basically > blocked for months because there are some concerns for workloads which > are not really asking for the feature. Sure your usecase might need to > handle root memcg differently. That is a fair point but that shouldn't > really block containers users who can use the proposed solution without > any further changes. If we ever decide to handle root memcg differently > we are free to do so because the oom selection policy is not carved in > stone by any api. > Please respond directly to the patchset which clearly enumerates the problems with the current implementation in -mm. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-08-01 22:56 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20171130152824.1591-1-guro@fb.com>
2018-06-05 11:47 ` [PATCH v13 0/7] cgroup-aware OOM killer Michal Hocko
2018-06-05 12:13 ` Michal Hocko
2018-07-13 21:59 ` David Rientjes
2018-07-14 1:55 ` Tetsuo Handa
2018-07-16 21:13 ` Tetsuo Handa
2018-07-16 22:09 ` Roman Gushchin
2018-07-17 0:55 ` Tetsuo Handa
2018-07-31 14:14 ` Tetsuo Handa
2018-08-01 16:37 ` Roman Gushchin
2018-08-01 22:01 ` Tetsuo Handa
2018-08-01 22:55 ` Roman Gushchin
2018-07-16 9:36 ` Michal Hocko
2018-07-17 3:59 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).