Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	James Morris <jmorris@namei.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Alexander Duyck <aduyck@mirantis.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <edumazet@google.com>,
	Tom Herbert <tom@herbertland.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Edward Cree <ecree@solarflare.com>,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers
Date: Thu, 29 Sep 2016 16:01:14 +0200	[thread overview]
Message-ID: <1475157674.4676.52.camel@redhat.com> (raw)
In-Reply-To: <1475155472.28155.164.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 2016-09-29 at 06:24 -0700, Eric Dumazet wrote:
> On Thu, 2016-09-29 at 11:31 +0200, Paolo Abeni wrote:
> > On Wed, 2016-09-28 at 18:42 -0700, Eric Dumazet wrote:
> > > On Wed, 2016-09-28 at 12:52 +0200, Paolo Abeni wrote:
> > > 
> > > > +static void udp_rmem_release(struct sock *sk, int partial)
> > > > +{
> > > > +	struct udp_sock *up = udp_sk(sk);
> > > > +	int fwd, amt;
> > > > +
> > > > +	if (partial && !udp_under_memory_pressure(sk))
> > > > +		return;
> > > > +
> > > > +	/* we can have concurrent release; if we catch any conflict
> > > > +	 * we let only one of them do the work
> > > > +	 */
> > > > +	if (atomic_dec_if_positive(&up->can_reclaim) < 0)
> > > > +		return;
> > > > +
> > > > +	fwd = __udp_forward(up, atomic_read(&sk->sk_rmem_alloc));
> > > > +	if (fwd < SK_MEM_QUANTUM + partial) {
> > > > +		atomic_inc(&up->can_reclaim);
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > > > +	atomic_sub(amt, &up->mem_allocated);
> > > > +	atomic_inc(&up->can_reclaim);
> > > > +
> > > > +	__sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > > > +	sk->sk_forward_alloc = fwd - amt;
> > > > +}
> > > 
> > > 
> > > This is racy... all these atomics make me nervous...
> > 
> > Ah, perhaps I got it: if we have a concurrent memory scheduling, we
> > could end up with a value of mem_allocated below the real need. 
> > 
> > That mismatch will not drift: at worst we can end up with mem_allocated
> > being single SK_MEM_QUANTUM below what is strictly needed.
> > 
> > A possible alternative could be:
> > 
> > static void udp_rmem_release(struct sock *sk, int partial)
> > {
> > 	struct udp_sock *up = udp_sk(sk);
> > 	int fwd, amt, alloc_old, alloc;
> > 
> > 	if (partial && !udp_under_memory_pressure(sk))
> > 		return;
> > 
> > 	alloc = atomic_read(&up->mem_allocated);
> > 	fwd = alloc - atomic_read(&sk->sk_rmem_alloc);
> > 	if (fwd < SK_MEM_QUANTUM + partial)
> > 		return;
> > 
> > 	amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > 	alloc_old = atomic_cmpxchg(&up->mem_allocated, alloc, alloc - amt);
> > 	/* if a concurrent update is detected, just do nothing; if said update
> > 	 * is due to another memory release, that release take care of
> > 	 * reclaiming the memory for us, too.
> > 	 * Otherwise we will be able to release on later dequeue, since
> > 	 * we will eventually stop colliding with the writer when it will
> > 	 * consume all the fwd allocated memory
> > 	 */
> > 	if (alloc_old != alloc)
> > 		return;
> > 
> > 	__sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > 	sk->sk_forward_alloc = fwd - amt;
> 
> Can still be done from multiple cpus.
> 
> Add some ndelay() or udelay() before to simulate fact that current cpu
> could be interrupted by an NMI handler (perf for example)... or hard IRQ
> handler...
> 
> Then make sure your tests involve 16 concurrent cpus dealing with one
> udp socket...

Thank you again reviewing this.

I'm working to this sort of tests right now.

> 
> > }
> > 
> > which is even more lazy in reclaiming but should never underestimate the
> > needed forward allocation, and under pressure should eventually free the
> > needed memory.
> 
> 
> If this code is rarely used, why don't you simply use a real spinlock,
> so that we do not have to worry about all this ?
> 
> A spinlock  acquisition/release is a _single_ locked operation.
> Faster than the 3 atomic you got in last version.
> spinlock code (ticket or MCS) avoids starvation.

I'd like to avoid adding a lock, if possible, to avoid any possible
source of contention.

> Then, you can safely update multiple fields in the socket.
> 
> And you get nice lockdep support as a bonus.
> 
> cmpxchg() is fine when a single field need an exclusion. But there you
> have multiple fields to update at once :
> 
> sk_memory_allocated_add() and sk_memory_allocated_sub() can work using 
> atomic_long_add_return() and atomic_long_sub() because their caller owns
> the socket lock and can safely update sk->sk_forward_alloc without
> additional locking, but UDP wont have this luxury after your patches.

When we reach __sk_mem_reduce_allocated() we are sure we can free the
specified amount of memory, so we only need to ensure consistent
sk_prot->memory_allocated updates. The current atomic operation suffices
to this.

Paolo

WARNING: multiple messages have this Message-ID (diff)

From: Paolo Abeni <pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
	James Morris <jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>,
	Trond Myklebust
	<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>,
	Alexander Duyck <aduyck-nYU0QVwCCFFWk0Htik3J/w@public.gmane.org>,
	Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>,
	Eric Dumazet <edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Tom Herbert <tom-BjP2VixgY4xUbtYUoyoikg@public.gmane.org>,
	Hannes Frederic Sowa
	<hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>,
	Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers
Date: Thu, 29 Sep 2016 16:01:14 +0200	[thread overview]
Message-ID: <1475157674.4676.52.camel@redhat.com> (raw)
In-Reply-To: <1475155472.28155.164.camel-XN9IlZ5yJG9HTL0Zs8A6p+yfmBU6pStAUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>

On Thu, 2016-09-29 at 06:24 -0700, Eric Dumazet wrote:
> On Thu, 2016-09-29 at 11:31 +0200, Paolo Abeni wrote:
> > On Wed, 2016-09-28 at 18:42 -0700, Eric Dumazet wrote:
> > > On Wed, 2016-09-28 at 12:52 +0200, Paolo Abeni wrote:
> > > 
> > > > +static void udp_rmem_release(struct sock *sk, int partial)
> > > > +{
> > > > +	struct udp_sock *up = udp_sk(sk);
> > > > +	int fwd, amt;
> > > > +
> > > > +	if (partial && !udp_under_memory_pressure(sk))
> > > > +		return;
> > > > +
> > > > +	/* we can have concurrent release; if we catch any conflict
> > > > +	 * we let only one of them do the work
> > > > +	 */
> > > > +	if (atomic_dec_if_positive(&up->can_reclaim) < 0)
> > > > +		return;
> > > > +
> > > > +	fwd = __udp_forward(up, atomic_read(&sk->sk_rmem_alloc));
> > > > +	if (fwd < SK_MEM_QUANTUM + partial) {
> > > > +		atomic_inc(&up->can_reclaim);
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > > > +	atomic_sub(amt, &up->mem_allocated);
> > > > +	atomic_inc(&up->can_reclaim);
> > > > +
> > > > +	__sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > > > +	sk->sk_forward_alloc = fwd - amt;
> > > > +}
> > > 
> > > 
> > > This is racy... all these atomics make me nervous...
> > 
> > Ah, perhaps I got it: if we have a concurrent memory scheduling, we
> > could end up with a value of mem_allocated below the real need. 
> > 
> > That mismatch will not drift: at worst we can end up with mem_allocated
> > being single SK_MEM_QUANTUM below what is strictly needed.
> > 
> > A possible alternative could be:
> > 
> > static void udp_rmem_release(struct sock *sk, int partial)
> > {
> > 	struct udp_sock *up = udp_sk(sk);
> > 	int fwd, amt, alloc_old, alloc;
> > 
> > 	if (partial && !udp_under_memory_pressure(sk))
> > 		return;
> > 
> > 	alloc = atomic_read(&up->mem_allocated);
> > 	fwd = alloc - atomic_read(&sk->sk_rmem_alloc);
> > 	if (fwd < SK_MEM_QUANTUM + partial)
> > 		return;
> > 
> > 	amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > 	alloc_old = atomic_cmpxchg(&up->mem_allocated, alloc, alloc - amt);
> > 	/* if a concurrent update is detected, just do nothing; if said update
> > 	 * is due to another memory release, that release take care of
> > 	 * reclaiming the memory for us, too.
> > 	 * Otherwise we will be able to release on later dequeue, since
> > 	 * we will eventually stop colliding with the writer when it will
> > 	 * consume all the fwd allocated memory
> > 	 */
> > 	if (alloc_old != alloc)
> > 		return;
> > 
> > 	__sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > 	sk->sk_forward_alloc = fwd - amt;
> 
> Can still be done from multiple cpus.
> 
> Add some ndelay() or udelay() before to simulate fact that current cpu
> could be interrupted by an NMI handler (perf for example)... or hard IRQ
> handler...
> 
> Then make sure your tests involve 16 concurrent cpus dealing with one
> udp socket...

Thank you again reviewing this.

I'm working to this sort of tests right now.

> 
> > }
> > 
> > which is even more lazy in reclaiming but should never underestimate the
> > needed forward allocation, and under pressure should eventually free the
> > needed memory.
> 
> 
> If this code is rarely used, why don't you simply use a real spinlock,
> so that we do not have to worry about all this ?
> 
> A spinlock  acquisition/release is a _single_ locked operation.
> Faster than the 3 atomic you got in last version.
> spinlock code (ticket or MCS) avoids starvation.

I'd like to avoid adding a lock, if possible, to avoid any possible
source of contention.

> Then, you can safely update multiple fields in the socket.
> 
> And you get nice lockdep support as a bonus.
> 
> cmpxchg() is fine when a single field need an exclusion. But there you
> have multiple fields to update at once :
> 
> sk_memory_allocated_add() and sk_memory_allocated_sub() can work using 
> atomic_long_add_return() and atomic_long_sub() because their caller owns
> the socket lock and can safely update sk->sk_forward_alloc without
> additional locking, but UDP wont have this luxury after your patches.

When we reach __sk_mem_reduce_allocated() we are sure we can free the
specified amount of memory, so we only need to ensure consistent
sk_prot->memory_allocated updates. The current atomic operation suffices
to this.

Paolo


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2016-09-29 14:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-28 10:52 [PATCH net-next v3 0/3] udp: refactor memory accounting Paolo Abeni
2016-09-28 10:52 ` Paolo Abeni
2016-09-28 10:52 ` [PATCH net-next v3 1/3] net/socket: factor out helpers for memory and queue manipulation Paolo Abeni
2016-09-28 10:52 ` [PATCH net-next v3 2/3] udp: implement memory accounting helpers Paolo Abeni
2016-09-29  1:42   ` Eric Dumazet
2016-09-29  7:34     ` Paolo Abeni
2016-09-29  7:34       ` Paolo Abeni
2016-09-29  9:31     ` Paolo Abeni
2016-09-29  9:31       ` Paolo Abeni
2016-09-29 13:24       ` Eric Dumazet
2016-09-29 13:24         ` Eric Dumazet
2016-09-29 14:01         ` Paolo Abeni [this message]
2016-09-29 14:01           ` Paolo Abeni
2016-09-29 14:13           ` Eric Dumazet
2016-09-29 14:13             ` Eric Dumazet
2016-09-29 14:34             ` Paolo Abeni
2016-09-29 14:49               ` Eric Dumazet
2016-09-29 14:59                 ` Paolo Abeni
2016-09-28 10:52 ` [PATCH net-next v3 3/3] udp: use it's own memory accounting schema Paolo Abeni
2016-09-28 10:52   ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1475157674.4676.52.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=aduyck@mirantis.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ecree@solarflare.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=jmorris@namei.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.