All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Balbir Singh <balbir@linux.vnet.ibm.com>, Jay Lan <jlan@sgi.com>,
	Jiri Pirko <jpirko@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
Date: Thu, 4 Dec 2008 14:52:50 +0100	[thread overview]
Message-ID: <20081204135250.GA5448@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0812031957570.19174@blonde.anvils>

On 12/03, Hugh Dickins wrote:
>
> On Wed, 3 Dec 2008, Oleg Nesterov wrote:
>
> > Unless we are going to decrease rss/vm there is no point to call the
> > (racy) update_hiwater_xxx() helpers. Still do_exit() does this, and
>
> I'm puzzled by this comment.  exit() _is_ about to decrease rss/vm,
> so isn't it right to be calling update_hiwater_xxx()?

Do you mean exit_mm()->...->exit_mmap() ? But this doesn't matter, this
->mm is going away. Nobody can read these counters when ->mm_users == 0,
no?

> There is a question of who's going to be able to see the result from
> this point on: I forget whether I was doing it for my own satisfaction,
> or for a real observer.  Even if there isn't a real observer today,
> I think I'd prefer do_exit() to continue to update_hiwater_xxx(),
> in case an observer is added tomorrow - unless you feel it's
> unjustifiably adding code to and slowing down process exit.

Please see below,

> You say "(racy)": in my view, it was only as racy as whatever might
> cause it to be racy.  By that, I mean that if the numbers ended up
> slightly wrong, you could reasonably imagine that the races happened
> in a different sequence which would have ended up with the numbers
> seen.  Have you noticed something more serious we need to fix?

But the difference can be huge. Let's suppose the process has 2 threads
T1 and T2.

T1 exits, calls update_hiwater_vm(), notices that mm->hiwater_vm must be
updated, and preempted right before mm->hiwater_vm = new_value.

T2 does free(malloc(A_LOT)) and exits. It sets the correct value for
->hiwater_vm which takes A_LOT into account and disappears.

T1 resumes, and "reverts" ->hiwater_vm to the "new_value" which was
calculated before.

Now, since T1 is the last exiting thread, the whole thread group
exits and we report the wrong ->hiwater_vm to the userspace.

Yes, this race is unlikely. But there is another reason (perhaps
not very good) why I tried to remove update_hiwater_xxx() from
do_exit().

Imho this code looks as if: from now it is "safe" to use ->hiwater_xxx
directly because we already updated it. But it is not. Unless we are
the last thread, we can't trust mm->hiwater_xxx anyway, we should
re-check get_mm_rss/total_vm.

> > Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() to use instead,
> > and kill the "if (tsk->mm) {}" code in do_exit().
>
> If you're going to add special helper macros (I don't care myself),
> wouldn't it be better to convert fs/proc/task_mmu.c (the original
> consumer) to use them too?

Yes, I was going to convert task_mem(), but noticed that it has
to read get_mm_rss() and ->total_vm anyway. Still, perhaps it
would be more clean to use the new macros anyway, even if this
wiil (unlikely) need a couple of extra cpu ticks.

> And, as I say, I'd _prefer_ that block to remain in do_exit(),
> but don't have strong evidence why it should.

I think that update_hiwater_xxx() should be "private" for vm code which
does zap/unmap, and any observer should use get_mm_hiwater_xxx(). Nobody
should use mm->hiwater_xxx directly.

But I don't (and of course can't) have a strong opinion on that,
will wait for your verdict and re-send.

The patch need the update anyway, I just noticed that if we remove
update_hiwater_xxx() from do_exit(), we should change the comment
in exit_mmap().

> > The first helper will
> > be also used to actually fill/report rusage->ru_maxrss.
>
> Oh, yes, I noticed a mail yesterday in which you claimed to Cc me,
> but didn't (like we all claim to be attaching missing patches ;)

Yes sorry ;) The only problem with mutt is that it is not possible
to change CC while editing the text (unless edit_headers is set).

Oleg.


  reply	other threads:[~2008-12-04 14:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-03 18:12 [PATCH] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting Oleg Nesterov
2008-12-03 20:24 ` Hugh Dickins
2008-12-04 13:52   ` Oleg Nesterov [this message]
2008-12-05 12:54     ` Hugh Dickins
2008-12-05 18:41       ` Jay Lan
2008-12-06  8:42         ` Balbir Singh
2008-12-06  9:56           ` Hugh Dickins
2008-12-06 10:39             ` Balbir Singh
2008-12-07 16:17               ` Oleg Nesterov
2008-12-07 16:58                 ` Balbir Singh
2008-12-07 17:28                   ` Oleg Nesterov
2008-12-07 17:39                     ` Balbir Singh
2008-12-06 15:26     ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081204135250.GA5448@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hugh@veritas.com \
    --cc=jlan@sgi.com \
    --cc=jpirko@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.