All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Vladimir Davydov <vdavydov@parallels.com>,
	Greg Thelen <gthelen@google.com>, Dave Hansen <dave@sr71.net>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Thu, 2 Oct 2014 15:52:14 -0400	[thread overview]
Message-ID: <20141002195214.GA2705@cmpxchg.org> (raw)
In-Reply-To: <20141002150135.GA1394@cmpxchg.org>

On Thu, Oct 02, 2014 at 11:01:35AM -0400, Johannes Weiner wrote:
> On Tue, Sep 30, 2014 at 01:06:22PM +0200, Michal Hocko wrote:
> > > +/**
> > > + * page_counter_limit - limit the number of pages allowed
> > > + * @counter: counter
> > > + * @limit: limit to set
> > > + *
> > > + * Returns 0 on success, -EBUSY if the current number of pages on the
> > > + * counter already exceeds the specified limit.
> > > + *
> > > + * The caller must serialize invocations on the same counter.
> > > + */
> > > +int page_counter_limit(struct page_counter *counter, unsigned long limit)
> > > +{
> > > +	for (;;) {
> > > +		unsigned long old;
> > > +		long count;
> > > +
> > > +		count = atomic_long_read(&counter->count);
> > > +
> > > +		old = xchg(&counter->limit, limit);
> > > +
> > > +		if (atomic_long_read(&counter->count) != count) {
> > > +			counter->limit = old;
> > > +			continue;
> > > +		}
> > > +
> > > +		if (count > limit) {
> > > +			counter->limit = old;
> > > +			return -EBUSY;
> > > +		}
> > 
> > Ordering doesn't make much sense to me here. Say you really want to set
> > limit < count. You are effectively pushing all concurrent charges to
> > the reclaim even though you would revert your change and return with
> > EBUSY later on.
> >
> > Wouldn't (count > limit) check make more sense right after the first
> > atomic_long_read?
> > Also the second count check should be sufficient to check > count and
> > retry only when the count has increased.
> > Finally continuous flow of charges can keep this loop running for quite
> > some time and trigger lockup detector. cond_resched before continue
> > would handle that. Something like the following:
> > 
> > 	for (;;) {
> > 		unsigned long old;
> > 		long count;
> > 
> > 		count = atomic_long_read(&counter->count);
> > 		if (count > limit)
> > 			return -EBUSY;
> > 
> > 		old = xchg(&counter->limit, limit);
> > 
> > 		/* Recheck for concurrent charges */
> > 		if (atomic_long_read(&counter->count) > count) {
> > 			counter->limit = old;
> > 			cond_resched();
> > 			continue;
> > 		}
> > 
> > 		return 0;
> > 	}
> 
> This is susceptible to spurious -EBUSY during races with speculative
> charges and uncharges.  My code avoids that by retrying until we set
> the limit without any concurrent counter operations first, before
> moving on to implementing policy and rollback.
> 
> Some reclaim activity caused by a limit that the user is trying to set
> anyway should be okay.  I'd rather have a reliable syscall.
> 
> But the cond_resched() is a good idea, I'll add that, thanks.

Thinking more about it, my code doesn't really avoid that if the
speculative charges persist over the two reads, it just widens the
window a bit.  But your suggestion seems indeed more readable,
although I'd invert the second branch.

How about this delta on top?

diff --git a/mm/page_counter.c b/mm/page_counter.c
index 4bdab1c7a057..7eb17135d4a4 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -19,8 +19,8 @@ int page_counter_cancel(struct page_counter *counter, unsigned long nr_pages)
 
 	new = atomic_long_sub_return(nr_pages, &counter->count);
 
-	if (WARN_ON_ONCE(new < 0))
-		atomic_long_add(nr_pages, &counter->count);
+	/* More uncharges than charges? */
+	WARN_ON_ONCE(new < 0);
 
 	return new > 0;
 }
@@ -146,29 +146,29 @@ int page_counter_limit(struct page_counter *counter, unsigned long limit)
 		unsigned long old;
 		long count;
 
-		count = atomic_long_read(&counter->count);
 		/*
+		 * Update the limit while making sure that it's not
+		 * below the (concurrently changing) counter value.
+		 *
 		 * The xchg implies two full memory barriers before
 		 * and after, so the read-swap-read is ordered and
 		 * ensures coherency with page_counter_try_charge():
 		 * that function modifies the count before checking
 		 * the limit, so if it sees the old limit, we see the
-		 * modified counter and retry.  This guarantees we
-		 * never successfully set a limit below the counter.
+		 * modified counter and retry.
 		 */
-		old = xchg(&counter->limit, limit);
-
-		if (atomic_long_read(&counter->count) != count) {
-			counter->limit = old;
-			continue;
-		}
+		count = atomic_long_read(&counter->count);
 
-		if (count > limit) {
-			counter->limit = old;
+		if (count > limit)
 			return -EBUSY;
-		}
 
-		return 0;
+		old = xchg(&counter->limit, limit);
+
+		if (atomic_long_read(&counter->count) <= count)
+			return 0;
+
+		counter->limit = old;
+		cond_resched();
 	}
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Vladimir Davydov <vdavydov@parallels.com>,
	Greg Thelen <gthelen@google.com>, Dave Hansen <dave@sr71.net>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Thu, 2 Oct 2014 15:52:14 -0400	[thread overview]
Message-ID: <20141002195214.GA2705@cmpxchg.org> (raw)
In-Reply-To: <20141002150135.GA1394@cmpxchg.org>

On Thu, Oct 02, 2014 at 11:01:35AM -0400, Johannes Weiner wrote:
> On Tue, Sep 30, 2014 at 01:06:22PM +0200, Michal Hocko wrote:
> > > +/**
> > > + * page_counter_limit - limit the number of pages allowed
> > > + * @counter: counter
> > > + * @limit: limit to set
> > > + *
> > > + * Returns 0 on success, -EBUSY if the current number of pages on the
> > > + * counter already exceeds the specified limit.
> > > + *
> > > + * The caller must serialize invocations on the same counter.
> > > + */
> > > +int page_counter_limit(struct page_counter *counter, unsigned long limit)
> > > +{
> > > +	for (;;) {
> > > +		unsigned long old;
> > > +		long count;
> > > +
> > > +		count = atomic_long_read(&counter->count);
> > > +
> > > +		old = xchg(&counter->limit, limit);
> > > +
> > > +		if (atomic_long_read(&counter->count) != count) {
> > > +			counter->limit = old;
> > > +			continue;
> > > +		}
> > > +
> > > +		if (count > limit) {
> > > +			counter->limit = old;
> > > +			return -EBUSY;
> > > +		}
> > 
> > Ordering doesn't make much sense to me here. Say you really want to set
> > limit < count. You are effectively pushing all concurrent charges to
> > the reclaim even though you would revert your change and return with
> > EBUSY later on.
> >
> > Wouldn't (count > limit) check make more sense right after the first
> > atomic_long_read?
> > Also the second count check should be sufficient to check > count and
> > retry only when the count has increased.
> > Finally continuous flow of charges can keep this loop running for quite
> > some time and trigger lockup detector. cond_resched before continue
> > would handle that. Something like the following:
> > 
> > 	for (;;) {
> > 		unsigned long old;
> > 		long count;
> > 
> > 		count = atomic_long_read(&counter->count);
> > 		if (count > limit)
> > 			return -EBUSY;
> > 
> > 		old = xchg(&counter->limit, limit);
> > 
> > 		/* Recheck for concurrent charges */
> > 		if (atomic_long_read(&counter->count) > count) {
> > 			counter->limit = old;
> > 			cond_resched();
> > 			continue;
> > 		}
> > 
> > 		return 0;
> > 	}
> 
> This is susceptible to spurious -EBUSY during races with speculative
> charges and uncharges.  My code avoids that by retrying until we set
> the limit without any concurrent counter operations first, before
> moving on to implementing policy and rollback.
> 
> Some reclaim activity caused by a limit that the user is trying to set
> anyway should be okay.  I'd rather have a reliable syscall.
> 
> But the cond_resched() is a good idea, I'll add that, thanks.

Thinking more about it, my code doesn't really avoid that if the
speculative charges persist over the two reads, it just widens the
window a bit.  But your suggestion seems indeed more readable,
although I'd invert the second branch.

How about this delta on top?

diff --git a/mm/page_counter.c b/mm/page_counter.c
index 4bdab1c7a057..7eb17135d4a4 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -19,8 +19,8 @@ int page_counter_cancel(struct page_counter *counter, unsigned long nr_pages)
 
 	new = atomic_long_sub_return(nr_pages, &counter->count);
 
-	if (WARN_ON_ONCE(new < 0))
-		atomic_long_add(nr_pages, &counter->count);
+	/* More uncharges than charges? */
+	WARN_ON_ONCE(new < 0);
 
 	return new > 0;
 }
@@ -146,29 +146,29 @@ int page_counter_limit(struct page_counter *counter, unsigned long limit)
 		unsigned long old;
 		long count;
 
-		count = atomic_long_read(&counter->count);
 		/*
+		 * Update the limit while making sure that it's not
+		 * below the (concurrently changing) counter value.
+		 *
 		 * The xchg implies two full memory barriers before
 		 * and after, so the read-swap-read is ordered and
 		 * ensures coherency with page_counter_try_charge():
 		 * that function modifies the count before checking
 		 * the limit, so if it sees the old limit, we see the
-		 * modified counter and retry.  This guarantees we
-		 * never successfully set a limit below the counter.
+		 * modified counter and retry.
 		 */
-		old = xchg(&counter->limit, limit);
-
-		if (atomic_long_read(&counter->count) != count) {
-			counter->limit = old;
-			continue;
-		}
+		count = atomic_long_read(&counter->count);
 
-		if (count > limit) {
-			counter->limit = old;
+		if (count > limit)
 			return -EBUSY;
-		}
 
-		return 0;
+		old = xchg(&counter->limit, limit);
+
+		if (atomic_long_read(&counter->count) <= count)
+			return 0;
+
+		counter->limit = old;
+		cond_resched();
 	}
 }
 

  reply	other threads:[~2014-10-02 19:52 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-24 15:43 [patch 0/3] mm: memcontrol: lockless page counters v2 Johannes Weiner
2014-09-24 15:43 ` Johannes Weiner
2014-09-24 15:43 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-09-24 15:43   ` Johannes Weiner
2014-09-26 10:31   ` Vladimir Davydov
2014-09-26 10:31     ` Vladimir Davydov
2014-10-02 12:07     ` Johannes Weiner
2014-10-02 12:07       ` Johannes Weiner
     [not found]       ` <20141002120748.GA1359-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-03 15:36         ` Vladimir Davydov
2014-10-03 15:36           ` Vladimir Davydov
2014-10-03 15:36           ` Vladimir Davydov
2014-10-03 15:41           ` Michal Hocko
2014-10-03 15:41             ` Michal Hocko
2014-10-06  6:38             ` Vladimir Davydov
2014-10-06  6:38               ` Vladimir Davydov
2014-09-30 11:06   ` Michal Hocko
2014-09-30 11:06     ` Michal Hocko
2014-10-02 15:01     ` Johannes Weiner
2014-10-02 15:01       ` Johannes Weiner
2014-10-02 19:52       ` Johannes Weiner [this message]
2014-10-02 19:52         ` Johannes Weiner
     [not found]         ` <20141002195214.GA2705-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-03 15:44           ` Michal Hocko
2014-10-03 15:44             ` Michal Hocko
2014-10-03 15:44             ` Michal Hocko
2014-10-03 14:50       ` Michal Hocko
2014-10-03 14:50         ` Michal Hocko
2014-10-07 15:15   ` Michal Hocko
2014-10-07 15:15     ` Michal Hocko
2014-10-08 12:31     ` Johannes Weiner
2014-10-08 12:31       ` Johannes Weiner
2014-09-24 15:43 ` [patch 2/3] mm: hugetlb_controller: convert to " Johannes Weiner
2014-09-24 15:43   ` Johannes Weiner
     [not found]   ` <1411573390-9601-3-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-09-26 11:25     ` Vladimir Davydov
2014-09-26 11:25       ` Vladimir Davydov
2014-09-26 11:25       ` Vladimir Davydov
2014-10-07 15:21   ` Michal Hocko
2014-10-07 15:21     ` Michal Hocko
2014-10-08 12:39     ` Johannes Weiner
2014-10-08 12:39       ` Johannes Weiner
2014-09-24 15:43 ` [patch 3/3] kernel: res_counter: remove the unused API Johannes Weiner
2014-09-24 15:43   ` Johannes Weiner
     [not found]   ` <1411573390-9601-4-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-09-26 11:27     ` Vladimir Davydov
2014-09-26 11:27       ` Vladimir Davydov
2014-09-26 11:27       ` Vladimir Davydov
2014-10-07 15:26     ` Michal Hocko
2014-10-07 15:26       ` Michal Hocko
2014-10-07 15:26       ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2014-10-14  1:46 [patch 0/3] mm: memcontrol: lockless page counters v3 Johannes Weiner
2014-10-14  1:46 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-10-14  1:46   ` Johannes Weiner
     [not found]   ` <1413251163-8517-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-14 15:56     ` Michal Hocko
2014-10-14 15:56       ` Michal Hocko
2014-10-14 15:56       ` Michal Hocko
     [not found]       ` <20141014155647.GA6414-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-10-14 16:33         ` Johannes Weiner
2014-10-14 16:33           ` Johannes Weiner
2014-10-14 16:33           ` Johannes Weiner
     [not found]           ` <20141014163354.GA23911-HTCKtW7iVlxqnrmGgq4/JMIURNUf+fel@public.gmane.org>
2014-10-15  9:40             ` Michal Hocko
2014-10-15  9:40               ` Michal Hocko
2014-10-15  9:40               ` Michal Hocko
2014-10-17  7:47   ` Vladimir Davydov
2014-10-17  7:47     ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141002195214.GA2705@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dave@sr71.net \
    --cc=gthelen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.