From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Indan Zupancic <indan@nul.nu>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
Daniel Phillips <phillips@google.com>,
Rik van Riel <riel@redhat.com>,
David Miller <davem@davemloft.net>
Subject: Re: [PATCH 1/4] net: VM deadlock avoidance framework
Date: Mon, 28 Aug 2006 19:32:24 +0200 [thread overview]
Message-ID: <1156786344.23000.47.camel@twins> (raw)
In-Reply-To: <3720.81.207.0.53.1156780999.squirrel@81.207.0.53>
On Mon, 2006-08-28 at 18:03 +0200, Indan Zupancic wrote:
> On Mon, August 28, 2006 12:22, Peter Zijlstra said:
> >> > @@ -391,6 +391,7 @@ enum sock_flags {
> >> > SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
> >> > SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
> >> > SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
> >> > + SOCK_VMIO, /* promise to never block on receive */
> >>
> >> It might be used for IO related to the VM, but that doesn't tell _what_ it does.
> >> It also does much more than just not blocking on receive, so overal, aren't
> >> both the vmio name and the comment slightly misleading?
> >
> > I'm so having trouble with this name; I had SOCK_NONBLOCKING for a
> > while, but that is a very bad name because nonblocking has this well
> > defined meaning when talking about sockets, and this is not that.
> >
> > Hence I came up with the VMIO, because that is the only selecting
> > criteria for being special. - I'll fix up the comment.
>
> It's nice and short, but it might be weird if someone after a while finds another way
> of using this stuff. And it's relation to 'emergency' looks unclear. So maybe calling
> both the same makes most sense, no matter how you name it.
I've tried to come up with another use-case, but failed (of course that
doesn't mean there is no). Also, I'm really past caring what the thing
is called ;-) But if ppl object I guess its easy enough to run yet
another sed command over the patches.
> >> > @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table);
> >> >
> >> > static char *zone_names[MAX_NR_ZONES] = { "DMA", "DMA32", "Normal", "HighMem" };
> >> > int min_free_kbytes = 1024;
> >> > +int var_free_kbytes;
> >>
> >> Using var_free_pages makes the code slightly simpler, as all that needless
> >> convertion isn't needed anymore. Perhaps the same is true for min_free_kbytes...
> >
> > 't seems I'm a bit puzzled as to what you mean here.
>
> I mean to store the variable reserve in pages instead of kilobytes. Currently you're
> converting from the one to the other both when setting and when using the value. That
> doesn't make much sense and can be avoided by storing the value in pages from the start.
right, will have a peek.
> void kfree_skbmem(struct sk_buff *skb)
> {
> struct sk_buff *other;
> atomic_t *fclone_ref;
> struct kmem_cache *cache = skbuff_head_cache;
> struct sk_buff *free = skb;
>
> skb_release_data(skb);
> switch (skb->fclone) {
> case SKB_FCLONE_UNAVAILABLE:
> goto free;
>
> case SKB_FCLONE_ORIG:
> fclone_ref = (atomic_t *) (skb + 2);
> if (atomic_dec_and_test(fclone_ref)){
> cache = skbuff_fclone_cache;
> goto free;
> }
> break;
>
> case SKB_FCLONE_CLONE:
> fclone_ref = (atomic_t *) (skb + 1);
> other = skb - 1;
>
> /* The clone portion is available for
> * fast-cloning again.
> */
> skb->fclone = SKB_FCLONE_UNAVAILABLE;
>
> if (atomic_dec_and_test(fclone_ref)){
> cache = skbuff_fclone_cache;
> free = other;
> goto free;
> }
> break;
> };
> return;
> free:
> if (!skb->emergency)
> kmem_cache_free(cache, free);
> else
> emergency_rx_free(free, kmem_cache_size(cache));
> }
Ah, like so, sure, that looks good.
> >> You can get rid of the memalloc_reserve and vmio_request_queues variables
> >> if you want, they aren't really needed for anything. If using them reduces
> >> the total code size I'd keep them though.
> >
> > I find my version easier to read, but that might just be the way my
> > brain works.
>
> Maybe true, but I believe my version is more natural in the sense that it makes
> more clear what the code is doing. Less bookkeeping, more real work, so to speak.
Ok, I'll have another look at it, perhaps my gray matter has shifted ;-)
> But after another look things seem a bit shaky, in the locking corner anyway.
>
> sk_adjust_memalloc() calls adjust_memalloc_reserve(), which changes var_free_kbytes
> and then calls setup_per_zone_pages_min(), which does the real work. But it reads
> min_free_kbytes without holding any locks. In mainline that's fine as the function
> is only called by the proc handler and in obscure memory hotplug stuff. But with
> your code it can also be called at any moment when a VMIO socket is made, which now
> races with the proc callback. More a theoretical than a real problem, but still
> slightly messy.
Knew about that, hadn't made up my mind on a fix yet. Good spot never
the less. Time to actually fix it I guess.
> adjust_memalloc_reserve() has no locking at all, while it might be called concurrently
> from different sources. Luckily sk_adjust_memalloc() is the only user, and which uses
> its own spinlock for synchronization, so things go well by accident now. It seems
> cleaner to move that spinlock so that it protects var|min_free_kbytes instead.
Ah, no accident there, I'm fully aware that there would need to be a
spinlock in adjust_memalloc_reserve() if there were another caller.
(I even had it there for some time) - added comment.
> +int adjust_memalloc_reserve(int pages)
> +{
> + int kbytes;
> + int err = 0;
> +
> + kbytes = var_free_kbytes + (pages << (PAGE_SHIFT - 10));
> + if (kbytes < 0) {
> + err = -EINVAL;
> + goto out;
> + }
>
> Shouldn't that be a BUG_ON instead?
Yeah, might as well be.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Indan Zupancic <indan@nul.nu>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
Daniel Phillips <phillips@google.com>,
Rik van Riel <riel@redhat.com>,
David Miller <davem@davemloft.net>
Subject: Re: [PATCH 1/4] net: VM deadlock avoidance framework
Date: Mon, 28 Aug 2006 19:32:24 +0200 [thread overview]
Message-ID: <1156786344.23000.47.camel@twins> (raw)
In-Reply-To: <3720.81.207.0.53.1156780999.squirrel@81.207.0.53>
On Mon, 2006-08-28 at 18:03 +0200, Indan Zupancic wrote:
> On Mon, August 28, 2006 12:22, Peter Zijlstra said:
> >> > @@ -391,6 +391,7 @@ enum sock_flags {
> >> > SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
> >> > SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
> >> > SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
> >> > + SOCK_VMIO, /* promise to never block on receive */
> >>
> >> It might be used for IO related to the VM, but that doesn't tell _what_ it does.
> >> It also does much more than just not blocking on receive, so overal, aren't
> >> both the vmio name and the comment slightly misleading?
> >
> > I'm so having trouble with this name; I had SOCK_NONBLOCKING for a
> > while, but that is a very bad name because nonblocking has this well
> > defined meaning when talking about sockets, and this is not that.
> >
> > Hence I came up with the VMIO, because that is the only selecting
> > criteria for being special. - I'll fix up the comment.
>
> It's nice and short, but it might be weird if someone after a while finds another way
> of using this stuff. And it's relation to 'emergency' looks unclear. So maybe calling
> both the same makes most sense, no matter how you name it.
I've tried to come up with another use-case, but failed (of course that
doesn't mean there is no). Also, I'm really past caring what the thing
is called ;-) But if ppl object I guess its easy enough to run yet
another sed command over the patches.
> >> > @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table);
> >> >
> >> > static char *zone_names[MAX_NR_ZONES] = { "DMA", "DMA32", "Normal", "HighMem" };
> >> > int min_free_kbytes = 1024;
> >> > +int var_free_kbytes;
> >>
> >> Using var_free_pages makes the code slightly simpler, as all that needless
> >> convertion isn't needed anymore. Perhaps the same is true for min_free_kbytes...
> >
> > 't seems I'm a bit puzzled as to what you mean here.
>
> I mean to store the variable reserve in pages instead of kilobytes. Currently you're
> converting from the one to the other both when setting and when using the value. That
> doesn't make much sense and can be avoided by storing the value in pages from the start.
right, will have a peek.
> void kfree_skbmem(struct sk_buff *skb)
> {
> struct sk_buff *other;
> atomic_t *fclone_ref;
> struct kmem_cache *cache = skbuff_head_cache;
> struct sk_buff *free = skb;
>
> skb_release_data(skb);
> switch (skb->fclone) {
> case SKB_FCLONE_UNAVAILABLE:
> goto free;
>
> case SKB_FCLONE_ORIG:
> fclone_ref = (atomic_t *) (skb + 2);
> if (atomic_dec_and_test(fclone_ref)){
> cache = skbuff_fclone_cache;
> goto free;
> }
> break;
>
> case SKB_FCLONE_CLONE:
> fclone_ref = (atomic_t *) (skb + 1);
> other = skb - 1;
>
> /* The clone portion is available for
> * fast-cloning again.
> */
> skb->fclone = SKB_FCLONE_UNAVAILABLE;
>
> if (atomic_dec_and_test(fclone_ref)){
> cache = skbuff_fclone_cache;
> free = other;
> goto free;
> }
> break;
> };
> return;
> free:
> if (!skb->emergency)
> kmem_cache_free(cache, free);
> else
> emergency_rx_free(free, kmem_cache_size(cache));
> }
Ah, like so, sure, that looks good.
> >> You can get rid of the memalloc_reserve and vmio_request_queues variables
> >> if you want, they aren't really needed for anything. If using them reduces
> >> the total code size I'd keep them though.
> >
> > I find my version easier to read, but that might just be the way my
> > brain works.
>
> Maybe true, but I believe my version is more natural in the sense that it makes
> more clear what the code is doing. Less bookkeeping, more real work, so to speak.
Ok, I'll have another look at it, perhaps my gray matter has shifted ;-)
> But after another look things seem a bit shaky, in the locking corner anyway.
>
> sk_adjust_memalloc() calls adjust_memalloc_reserve(), which changes var_free_kbytes
> and then calls setup_per_zone_pages_min(), which does the real work. But it reads
> min_free_kbytes without holding any locks. In mainline that's fine as the function
> is only called by the proc handler and in obscure memory hotplug stuff. But with
> your code it can also be called at any moment when a VMIO socket is made, which now
> races with the proc callback. More a theoretical than a real problem, but still
> slightly messy.
Knew about that, hadn't made up my mind on a fix yet. Good spot never
the less. Time to actually fix it I guess.
> adjust_memalloc_reserve() has no locking at all, while it might be called concurrently
> from different sources. Luckily sk_adjust_memalloc() is the only user, and which uses
> its own spinlock for synchronization, so things go well by accident now. It seems
> cleaner to move that spinlock so that it protects var|min_free_kbytes instead.
Ah, no accident there, I'm fully aware that there would need to be a
spinlock in adjust_memalloc_reserve() if there were another caller.
(I even had it there for some time) - added comment.
> +int adjust_memalloc_reserve(int pages)
> +{
> + int kbytes;
> + int err = 0;
> +
> + kbytes = var_free_kbytes + (pages << (PAGE_SHIFT - 10));
> + if (kbytes < 0) {
> + err = -EINVAL;
> + goto out;
> + }
>
> Shouldn't that be a BUG_ON instead?
Yeah, might as well be.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-28 17:38 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-25 15:39 [PATCH 0/4] VM deadlock prevention -v5 Peter Zijlstra
2006-08-25 15:39 ` Peter Zijlstra
2006-08-25 15:39 ` [PATCH 1/4] net: VM deadlock avoidance framework Peter Zijlstra
2006-08-25 15:39 ` Peter Zijlstra
2006-08-26 2:37 ` Indan Zupancic
2006-08-26 2:37 ` Indan Zupancic
2006-08-28 10:22 ` Peter Zijlstra
2006-08-28 10:22 ` Peter Zijlstra
2006-08-28 16:03 ` Indan Zupancic
2006-08-28 16:03 ` Indan Zupancic
2006-08-28 17:32 ` Peter Zijlstra [this message]
2006-08-28 17:32 ` Peter Zijlstra
2006-08-29 0:01 ` Indan Zupancic
2006-08-29 0:01 ` Indan Zupancic
2006-08-29 9:49 ` Peter Zijlstra
2006-08-29 9:49 ` Peter Zijlstra
2006-08-29 19:53 ` Indan Zupancic
2006-08-29 19:53 ` Indan Zupancic
2006-08-25 15:40 ` [PATCH 2/4] blkdev: iosched selection for queue creation Peter Zijlstra
2006-08-25 15:40 ` Peter Zijlstra
2006-08-25 15:40 ` [PATCH 3/4] nbd: deadlock prevention for NBD Peter Zijlstra
2006-08-25 15:40 ` Peter Zijlstra
2006-08-25 15:40 ` [PATCH 4/4] nfs: deadlock prevention for NFS Peter Zijlstra
2006-08-25 15:40 ` Peter Zijlstra
2006-08-25 20:14 ` Trond Myklebust
2006-08-25 20:14 ` Trond Myklebust
2006-08-25 20:36 ` Peter Zijlstra
2006-08-25 20:36 ` Peter Zijlstra
2006-08-26 3:05 ` Indan Zupancic
2006-08-26 3:05 ` Indan Zupancic
2006-08-25 15:51 ` [PATCH 0/4] VM deadlock prevention -v5 Christoph Lameter
2006-08-25 15:51 ` Christoph Lameter
2006-08-25 15:52 ` Peter Zijlstra
2006-08-25 15:52 ` Peter Zijlstra
2006-08-25 16:04 ` Rik van Riel
2006-08-25 16:04 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1156786344.23000.47.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=indan@nul.nu \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=phillips@google.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.