From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-mm@kvack.org, David Miller <davem@davemloft.net>
Subject: Re: [PATCH 9/9] net: vm deadlock avoidance core
Date: Thu, 18 Jan 2007 13:41:44 +0300 [thread overview]
Message-ID: <20070118104144.GA20925@2ka.mipt.ru> (raw)
In-Reply-To: <1169024848.22935.109.camel@twins>
On Wed, Jan 17, 2007 at 10:07:28AM +0100, Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> > You operate with 'current' in different contexts without any locks which
> > looks racy and even is not allowed. What will be 'current' for
> > netif_rx() case, which schedules softirq from hard irq context -
> > ksoftirqd, why do you want to set its flags?
>
> I don't touch current in hardirq context, do I (if I did, that is indeed
> a mistake)?
>
> In all other contexts, current is valid.
Well, if you think that setting PF_MEMALLOC flag for keventd and
ksoftirqd is valid, then probably yes...
> > > > I meant that you can just mark process which created such socket as
> > > > PF_MEMALLOC, and clone that flag on forks and other relatest calls without
> > > > all that checks for 'current' in different places.
> > >
> > > Ah, thats the wrong level to think here, these processes never reach
> > > user-space - nor should these sockets.
> >
> > You limit this just to send an ack?
> > What about 'level-7' ack as you described in introduction?
>
> Take NFS, it does full data traffic in kernel.
NFS case is exactly the situation, when you only need to generate an ACK.
> > > Also, I only want the processing of the actual network packet to be able
> > > to eat the reserves, not any other thing that might happen in that
> > > context.
> > >
> > > And since network processing is mostly done in softirq context I must
> > > mark these sections like I did.
> >
> > You artificially limit system to just add a reserve to generate one ack.
> > For that purpose you do not need to have all those flags - just reseve
> > some data in network core and use it when system is in OOM (or reclaim)
> > for critical data pathes.
>
> How would that end up being different, I would have to replace all
> allocations done in the full network processing path.
>
> This seems a much less invasive method, all the (allocation) code can
> stay the way it is and use the normal allocation functions.
Ack is only generated in one place in TCP.
And acutally we are starting to talk about different approach - having
separated allocator for network, which will be turned on on OOM (reclaim
or at any other time). If you do not mind, I would likw to refresh a
discussion about network tree allocator, which utilizes own pool of
pages, performs self-defragmentation of the memeory, is very SMP
friendly in that regard that it is per-cpu like slab and never free
objects on different CPUs, so they always stay in the same cache.
Among other goodies it allows to have full sending/receiving zero-copy.
Here is a link:
http://tservice.net.ru/~s0mbre/old/?section=projects&item=nta
> > > > > > > + /*
> > > > > > > + decrease window size..
> > > > > > > + tcp_enter_quickack_mode(sk);
> > > > > > > + */
> > > > > >
> > > > > > How does this decrease window size?
> > > > > > Maybe ack scheduling would be better handled by inet_csk_schedule_ack()
> > > > > > or just directly send an ack, which in turn requires allocation, which
> > > > > > can be bound to this received frame processing...
> > > > >
> > > > > It doesn't, I thought that it might be a good idea doing that, but never
> > > > > got around to actually figuring out how to do it.
> > > >
> > > > tcp_send_ack()?
> > > >
> > >
> > > does that shrink the window automagically?
> >
> > Yes, it updates window, but having ack generated in that place is
> > actually very wrong. In that place system has not processed incoming
> > packet yet, so it can not generate correct ACK for received frame at
> > all. And it seems that the only purpose of the whole patchset is to
> > generate that poor ack - reseve 2007 ack packets (MAX_TCP_HEADER)
> > in system startup and reuse them when you are under memory pressure.
>
> Right, I suspected something like that; hence I wanted to just shrink
> the window. Anyway, this is not a very important issue.
tcp_enter_quickack_mode() does not update window, it allows to send ack
immediately after packet has been processed, window can be changed in
any way TCP state machine and congestion control want.
--
Evgeniy Polyakov
WARNING: multiple messages have this Message-ID (diff)
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-mm@kvack.org, David Miller <davem@davemloft.net>
Subject: Re: [PATCH 9/9] net: vm deadlock avoidance core
Date: Thu, 18 Jan 2007 13:41:44 +0300 [thread overview]
Message-ID: <20070118104144.GA20925@2ka.mipt.ru> (raw)
In-Reply-To: <1169024848.22935.109.camel@twins>
On Wed, Jan 17, 2007 at 10:07:28AM +0100, Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> > You operate with 'current' in different contexts without any locks which
> > looks racy and even is not allowed. What will be 'current' for
> > netif_rx() case, which schedules softirq from hard irq context -
> > ksoftirqd, why do you want to set its flags?
>
> I don't touch current in hardirq context, do I (if I did, that is indeed
> a mistake)?
>
> In all other contexts, current is valid.
Well, if you think that setting PF_MEMALLOC flag for keventd and
ksoftirqd is valid, then probably yes...
> > > > I meant that you can just mark process which created such socket as
> > > > PF_MEMALLOC, and clone that flag on forks and other relatest calls without
> > > > all that checks for 'current' in different places.
> > >
> > > Ah, thats the wrong level to think here, these processes never reach
> > > user-space - nor should these sockets.
> >
> > You limit this just to send an ack?
> > What about 'level-7' ack as you described in introduction?
>
> Take NFS, it does full data traffic in kernel.
NFS case is exactly the situation, when you only need to generate an ACK.
> > > Also, I only want the processing of the actual network packet to be able
> > > to eat the reserves, not any other thing that might happen in that
> > > context.
> > >
> > > And since network processing is mostly done in softirq context I must
> > > mark these sections like I did.
> >
> > You artificially limit system to just add a reserve to generate one ack.
> > For that purpose you do not need to have all those flags - just reseve
> > some data in network core and use it when system is in OOM (or reclaim)
> > for critical data pathes.
>
> How would that end up being different, I would have to replace all
> allocations done in the full network processing path.
>
> This seems a much less invasive method, all the (allocation) code can
> stay the way it is and use the normal allocation functions.
Ack is only generated in one place in TCP.
And acutally we are starting to talk about different approach - having
separated allocator for network, which will be turned on on OOM (reclaim
or at any other time). If you do not mind, I would likw to refresh a
discussion about network tree allocator, which utilizes own pool of
pages, performs self-defragmentation of the memeory, is very SMP
friendly in that regard that it is per-cpu like slab and never free
objects on different CPUs, so they always stay in the same cache.
Among other goodies it allows to have full sending/receiving zero-copy.
Here is a link:
http://tservice.net.ru/~s0mbre/old/?section=projects&item=nta
> > > > > > > + /*
> > > > > > > + decrease window size..
> > > > > > > + tcp_enter_quickack_mode(sk);
> > > > > > > + */
> > > > > >
> > > > > > How does this decrease window size?
> > > > > > Maybe ack scheduling would be better handled by inet_csk_schedule_ack()
> > > > > > or just directly send an ack, which in turn requires allocation, which
> > > > > > can be bound to this received frame processing...
> > > > >
> > > > > It doesn't, I thought that it might be a good idea doing that, but never
> > > > > got around to actually figuring out how to do it.
> > > >
> > > > tcp_send_ack()?
> > > >
> > >
> > > does that shrink the window automagically?
> >
> > Yes, it updates window, but having ack generated in that place is
> > actually very wrong. In that place system has not processed incoming
> > packet yet, so it can not generate correct ACK for received frame at
> > all. And it seems that the only purpose of the whole patchset is to
> > generate that poor ack - reseve 2007 ack packets (MAX_TCP_HEADER)
> > in system startup and reuse them when you are under memory pressure.
>
> Right, I suspected something like that; hence I wanted to just shrink
> the window. Anyway, this is not a very important issue.
tcp_enter_quickack_mode() does not update window, it allows to send ack
immediately after packet has been processed, window can be changed in
any way TCP state machine and congestion control want.
--
Evgeniy Polyakov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-01-18 10:42 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-16 9:45 [PATCH 0/9] VM deadlock avoidance -v10 Peter Zijlstra
2007-01-16 9:45 ` Peter Zijlstra
2007-01-16 9:45 ` [PATCH 1/9] mm: page allocation rank Peter Zijlstra
2007-01-16 9:45 ` Peter Zijlstra
2007-01-16 9:45 ` [PATCH 2/9] mm: slab allocation fairness Peter Zijlstra
2007-01-16 9:45 ` Peter Zijlstra
2007-01-16 9:46 ` [PATCH 3/9] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2007-01-16 9:46 ` Peter Zijlstra
2007-01-16 9:46 ` [PATCH 4/9] mm: serialize access to min_free_kbytes Peter Zijlstra
2007-01-16 9:46 ` Peter Zijlstra
2007-01-16 9:46 ` [PATCH 5/9] mm: emergency pool Peter Zijlstra
2007-01-16 9:46 ` Peter Zijlstra
2007-01-16 9:46 ` [PATCH 6/9] mm: __GFP_EMERGENCY Peter Zijlstra
2007-01-16 9:46 ` Peter Zijlstra
2007-01-16 9:46 ` [PATCH 7/9] mm: allow mempool to fall back to memalloc reserves Peter Zijlstra
2007-01-16 9:46 ` Peter Zijlstra
2007-01-16 9:46 ` [PATCH 8/9] slab: kmem_cache_objs_to_pages() Peter Zijlstra
2007-01-16 9:46 ` Peter Zijlstra
2007-01-16 9:46 ` [PATCH 9/9] net: vm deadlock avoidance core Peter Zijlstra
2007-01-16 9:46 ` Peter Zijlstra
2007-01-16 13:25 ` Evgeniy Polyakov
2007-01-16 13:25 ` Evgeniy Polyakov
2007-01-16 13:47 ` Peter Zijlstra
2007-01-16 13:47 ` Peter Zijlstra
2007-01-16 15:33 ` Evgeniy Polyakov
2007-01-16 15:33 ` Evgeniy Polyakov
2007-01-16 16:08 ` Peter Zijlstra
2007-01-16 16:08 ` Peter Zijlstra
2007-01-17 4:54 ` Evgeniy Polyakov
2007-01-17 4:54 ` Evgeniy Polyakov
2007-01-17 9:07 ` Peter Zijlstra
2007-01-17 9:07 ` Peter Zijlstra
2007-01-18 10:41 ` Evgeniy Polyakov [this message]
2007-01-18 10:41 ` Evgeniy Polyakov
2007-01-18 12:18 ` Peter Zijlstra
2007-01-18 12:18 ` Peter Zijlstra
2007-01-18 13:58 ` Possible ways of dealing with OOM conditions Evgeniy Polyakov
2007-01-18 13:58 ` Evgeniy Polyakov
2007-01-18 15:10 ` Peter Zijlstra
2007-01-18 15:10 ` Peter Zijlstra
2007-01-18 15:50 ` Evgeniy Polyakov
2007-01-18 15:50 ` Evgeniy Polyakov
2007-01-18 17:31 ` Peter Zijlstra
2007-01-18 17:31 ` Peter Zijlstra
2007-01-18 18:34 ` Evgeniy Polyakov
2007-01-18 18:34 ` Evgeniy Polyakov
2007-01-19 12:53 ` Peter Zijlstra
2007-01-19 12:53 ` Peter Zijlstra
2007-01-19 22:56 ` Evgeniy Polyakov
2007-01-19 22:56 ` Evgeniy Polyakov
2007-01-20 22:36 ` Rik van Riel
2007-01-20 22:36 ` Rik van Riel
2007-01-21 1:46 ` Evgeniy Polyakov
2007-01-21 1:46 ` Evgeniy Polyakov
2007-01-21 2:14 ` Evgeniy Polyakov
2007-01-21 2:14 ` Evgeniy Polyakov
2007-01-21 16:30 ` Rik van Riel
2007-01-21 16:30 ` Rik van Riel
2007-01-19 17:54 ` Christoph Lameter
2007-01-19 17:54 ` Christoph Lameter
2007-01-17 9:12 ` [PATCH 0/9] VM deadlock avoidance -v10 Pavel Machek
2007-01-17 9:12 ` Pavel Machek
2007-01-17 9:20 ` Peter Zijlstra
2007-01-17 9:20 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070118104144.GA20925@2ka.mipt.ru \
--to=johnpol@2ka.mipt.ru \
--cc=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.