Re: [PATCH]updated ipc lock patch

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH]updated ipc lock patch
       [not found] <Pine.LNX.4.44.0210270748560.1704-100000@localhost.localdomain>
@ 2002-10-28  1:06 ` Rusty Russell
  2002-10-28 14:21   ` Hugh Dickins
  2002-10-28 20:00   ` Dipankar Sarma
  0 siblings, 2 replies; 11+ messages in thread
From: Rusty Russell @ 2002-10-28  1:06 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: mingming cao, Andrew Morton, linux-kernel

In message <Pine.LNX.4.44.0210270748560.1704-100000@localhost.localdomain> you 
write:
> On Sun, 27 Oct 2002, Rusty Russell wrote:
> > 
> > You can't do that.  It's the price you pay.  It's nonsensical to fail
> > to destroy an shm or sem.
> 
> Ironic, but not nonsensical.

Yes, nonsensical.  Firstly, it's in violation of the standard to fail
IPC_RMID under random circumstances.  Secondly, failing to clean up is
an unhandlable error, since you're quite possible in the failure path
of the code already.  This is a well known issue.

> > Using a mempool is putting your head in the sand, because it's use is
> > not bounded.  Might as well just ignore kmalloc failures and leak
> > memory, which is *really* dumb, because if we get killed by the
> > oom-killer because we're out of memory, and that results in IPC trying
> > to free.
> 
> Bounded in what sense?  The mempool is dedicated to ipc freeing, it's
> not being drawn on by other kinds of use.  In the OOM-kill case of
> actually getting down to using the reserved pool, each reserved item
> will be returned when RCU (and, in the vfree case, the additional
> scheduled work) has done with it.  Unbounded in that we cannot say
> how many milliseconds that will take, but so what?

Two oom kills.  Three oom kills.  Four oom kills.  Where's the bound
here?

Our allocator behavior for GFP_KERNEL has changed several times.  Are
you sure that it won't *ever* fail under other circomstances?

> Okay (I expect, didn't review it) for just the ids arrays, but too much
> memory waste if we have to allocate for each msq, sema, shm: if there's
> a better solution available.  mempool looks better to us.

It's a hacky, fragile and incorrect solution.  It's completely
tasteless.

There are *three* things this patch does, and it's not clear which
ones helped the dbt1 benchmark (and I can't find the source).  But
let's assume they're all good optimizations (*cough*).

In order to save 12 bytes, you've added dozens of lines of subtle,
fragile, incorrect code.  You honestly think this is a worthwhile
tradeoff?

> > Hope that helps,
> 
> Not yet.  You seem to have had a bad experience with something like
> this (the mempools or the RCU or the combination), and you're warning
> us away without actually telling us what you found.

I *wrote* the RCU interface (though the implementation in 2.5 isn't
mine).  I thought it was pretty clear how it was supposed to be used.
Obviously, I was wrong.

> I suspect that whatever it was that you found, is not relevant to
> this IPC case.

Clear code vs. bad code?  Definitely relevent here.

Patch below is against Mingming's mm4 release.  Compiles, untested.
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Commentry: There are several cases where ipc_alloc is *always*
followed by ipc_free, so the extra allocation is gratuitous, but these
are temporary allocations and the extra size is not critical.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.5.44-mm4-ipc-rcu/include/linux/msg.h working-2.5.44-mm4-ipc-rcu-fix/include/linux/msg.h
--- working-2.5.44-mm4-ipc-rcu/include/linux/msg.h	2002-07-21 17:43:10.000000000 +1000
+++ working-2.5.44-mm4-ipc-rcu-fix/include/linux/msg.h	2002-10-28 11:12:54.000000000 +1100
@@ -2,6 +2,7 @@
 #define _LINUX_MSG_H
 
 #include <linux/ipc.h>
+#include <linux/rcupdate.h>
 
 /* ipcs ctl commands */
 #define MSG_STAT 11
@@ -90,6 +91,8 @@ struct msg_queue {
 	struct list_head q_messages;
 	struct list_head q_receivers;
 	struct list_head q_senders;
+
+	struct rcu_head q_free;		/* Used to free this via rcu */
 };
 
 asmlinkage long sys_msgget (key_t key, int msgflg);
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.5.44-mm4-ipc-rcu/include/linux/sem.h working-2.5.44-mm4-ipc-rcu-fix/include/linux/sem.h
--- working-2.5.44-mm4-ipc-rcu/include/linux/sem.h	2002-06-06 21:38:40.000000000 +1000
+++ working-2.5.44-mm4-ipc-rcu-fix/include/linux/sem.h	2002-10-28 11:24:39.000000000 +1100
@@ -2,6 +2,7 @@
 #define _LINUX_SEM_H
 
 #include <linux/ipc.h>
+#include <linux/rcupdate.h>
 
 /* semop flags */
 #define SEM_UNDO        0x1000  /* undo the operation on exit */
@@ -94,6 +95,7 @@ struct sem_array {
 	struct sem_queue	**sem_pending_last; /* last pending operation */
 	struct sem_undo		*undo;		/* undo requests on this array */
 	unsigned long		sem_nsems;	/* no. of semaphores in array */
+	struct rcu_head		sem_free;	/* used to rcu free array. */
 };
 
 /* One queue for each sleeping process in the system. */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.5.44-mm4-ipc-rcu/ipc/util.c working-2.5.44-mm4-ipc-rcu-fix/ipc/util.c
--- working-2.5.44-mm4-ipc-rcu/ipc/util.c	2002-10-28 11:08:35.000000000 +1100
+++ working-2.5.44-mm4-ipc-rcu-fix/ipc/util.c	2002-10-28 12:01:09.000000000 +1100
@@ -213,21 +213,49 @@ struct kern_ipc_perm* ipc_rmid(struct ip
 	return p;
 }
 
+struct ipc_rcu_kmalloc
+{
+	struct rcu_head rcu;
+	/* "void *" makes sure alignment of following data is sane. */
+	void *data[0];
+};
+
+struct ipc_rcu_vmalloc
+{
+	struct rcu_head rcu;
+	struct work_struct work;
+	/* "void *" makes sure alignment of following data is sane. */
+	void *data[0];
+};
+
+static inline int use_vmalloc(int size)
+{
+	/* Too big for a single page? */
+	if (sizeof(struct ipc_rcu_kmalloc) + size > PAGE_SIZE)
+		return 1;
+	return 0;
+}
+
 /**
  *	ipc_alloc	-	allocate ipc space
  *	@size: size desired
  *
  *	Allocate memory from the appropriate pools and return a pointer to it.
- *	NULL is returned if the allocation fails
+ *	NULL is returned if the allocation fails.  This can be freed with
+ *	ipc_free (to free immediately) or ipc_rcu_free (to free once safe).
  */
- 
 void* ipc_alloc(int size)
 {
 	void* out;
-	if(size > PAGE_SIZE)
-		out = vmalloc(size);
-	else
-		out = kmalloc(size, GFP_KERNEL);
+	/* We prepend the allocation with the rcu struct, and
+           workqueue if necessary (for vmalloc). */
+	if (use_vmalloc(size)) {
+		out = vmalloc(sizeof(struct ipc_rcu_vmalloc) + size);
+		if (out) out += sizeof(struct ipc_rcu_vmalloc);
+	} else {
+		out = kmalloc(sizeof(struct ipc_rcu_kmalloc)+size, GFP_KERNEL);
+		if (out) out += sizeof(struct ipc_rcu_kmalloc);
+	}
 	return out;
 }
 
@@ -242,48 +270,36 @@ void* ipc_alloc(int size)
  
 void ipc_free(void* ptr, int size)
 {
-	if(size > PAGE_SIZE)
-		vfree(ptr);
+	if (use_vmalloc(size))
+		vfree(ptr - sizeof(struct ipc_rcu_vmalloc));
 	else
-		kfree(ptr);
+		kfree(ptr - sizeof(struct ipc_rcu_kmalloc));
 }
 
 /* 
  * Since RCU callback function is called in bh,
  * we need to defer the vfree to schedule_work
  */
-static void ipc_free_scheduled(void* arg)
+static void ipc_schedule_free(void *arg)
 {
-	struct rcu_ipc_free *a = (struct rcu_ipc_free *)arg;
-	vfree(a->ptr);
-	kfree(a);
-}
+	struct ipc_rcu_vmalloc *free = arg;
 
-static void ipc_free_callback(void* arg)
-{
-	struct rcu_ipc_free *a = (struct rcu_ipc_free *)arg;
-	/* 
-	 * if data is vmalloced, then we need to delay the free
-	 */
-	if (a->size > PAGE_SIZE) {
-		INIT_WORK(&a->work, ipc_free_scheduled, arg);
-		schedule_work(&a->work);
-	} else {
-		kfree(a->ptr);
-		kfree(a);
-	}
+	INIT_WORK(&free->work, vfree, free);
+	schedule_work(&free->work);
 }
 
 void ipc_rcu_free(void* ptr, int size)
 {
-	struct rcu_ipc_free* arg;
-
-	arg = (struct rcu_ipc_free *) kmalloc(sizeof(*arg), GFP_KERNEL);
-	if (arg == NULL)
-		return;
-	arg->ptr = ptr;
-	arg->size = size;
-	call_rcu(&arg->rcu_head, ipc_free_callback, arg);
+	if (use_vmalloc(size)) {
+		struct ipc_rcu_vmalloc *free;
+		free = ptr - sizeof(*free);
+		call_rcu(&free->rcu, ipc_schedule_free, free);
+	} else {
+		struct ipc_rcu_kmalloc *free;
+		free = ptr - sizeof(*free);
+		/* kfree takes a "const void *" so gcc warns.  So we cast. */
+		call_rcu(&free->rcu, (void (*)(void *))kfree, free);
+	}
 }
 
 /**

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28  1:06 ` [PATCH]updated ipc lock patch Rusty Russell
@ 2002-10-28 14:21   ` Hugh Dickins
  2002-10-28 21:47     ` Rusty Russell
  2002-10-28 20:00   ` Dipankar Sarma
  1 sibling, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2002-10-28 14:21 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mingming cao, Andrew Morton, linux-kernel

You may prefer to skip the history and goto Your_patch;

On Mon, 28 Oct 2002, Rusty Russell wrote:
> In message <Pine.LNX.4.44.0210270748560.1704-100000@localhost.localdomain> you 
> write:
> > On Sun, 27 Oct 2002, Rusty Russell wrote:
> > > 
> > > You can't do that.  It's the price you pay.  It's nonsensical to fail
> > > to destroy an shm or sem.
> > 
> > Ironic, but not nonsensical: remember, this would only happen (if we
> > abandoned the mempool method and) the task trying to free is itself
> > being OOM-killed - sometimes, OOM-killing will kill the very task
> > that might have gone on to free even more memory, just a sad fact.

(Since you're bringing our discussion to linux-kernel,
I've restored the full paragraph of my side of the argument above.)

> Yes, nonsensical.  Firstly, it's in violation of the standard to fail
> IPC_RMID under random circumstances.  Secondly, failing to clean up is
> an unhandlable error, since you're quite possible in the failure path
> of the code already.  This is a well known issue.

The task which would have failed was being OOM-killed: it's not even
going to get back to userspace.  It might have been OOM-killed just
before it tried to IPC_RMID, but it happened during, that's all.
I think OOM-killing lies, shall we say, outside the standard?

But that's all irrelevant: we (Andrew tidied patch by Mingming
following recommendation by me of solution by Andrew) added the
mempool so that it will surely succeed, as you insisted, even if
OOM-kill intervenes at the worst moment.

> > > Using a mempool is putting your head in the sand, because it's use is
> > > not bounded.  Might as well just ignore kmalloc failures and leak
> > > memory, which is *really* dumb, because if we get killed by the
> > > oom-killer because we're out of memory, and that results in IPC trying
> > > to free.
> > 
> > Bounded in what sense?  The mempool is dedicated to ipc freeing, it's
> > not being drawn on by other kinds of use.  In the OOM-kill case of
> > actually getting down to using the reserved pool, each reserved item
> > will be returned when RCU (and, in the vfree case, the additional
> > scheduled work) has done with it.  Unbounded in that we cannot say
> > how many milliseconds that will take, but so what?
> 
> Two oom kills.  Three oom kills.  Four oom kills.  Where's the bound
> here?

No bound to the number of possible OOM kills, but what problem is that?
I got excited for a few minutes when I thought you were saying that the
task being OOM-killed would exit holding on to its mempool buffer.  But
that's not so, is it?  We'd have wider, more serious resource problems
with OOM kills if that were so.

> Our allocator behavior for GFP_KERNEL has changed several times.  Are
> you sure that it won't *ever* fail under other circomstances?

Well, I'll be surprised if we change kmalloc(GFP_KERNEL) to fail for
reasons other than memory shortage ("insufficient privilege"? hmm,
we'd change the names before that); though how hard it tries to decide
if there's really a memory shortage certainly changes from one kernel
to another.  But so long as it doesn't fail very often in normal
circumstances, it's okay: the reserved mempool buffers back it up.

> > Okay (I expect, didn't review it) for just the ids arrays, but too much
> > memory waste if we have to allocate for each msq, sema, shm: if there's
> > a better solution available.  mempool looks better to us.
> 
> It's a hacky, fragile and incorrect solution.  It's completely
> tasteless.

I guess it's a step forward that you haven't called it senseless.
Hacky, tasteless?  Maybe, depends on your taste.  I'm inclined to
counter that it's _fashionable_, which holds about as much weight.
But fragile, incorrect?  You've repeatedly failed to argue that.

> In order to save 12 bytes, you've added dozens of lines of subtle,
> fragile, incorrect code.  You honestly think this is a worthwhile
> tradeoff?

I thought mempool was the best way to go, to avoid wasting 80 bytes
per msq, sema, shmseg - I thought it would get ugly (hacky, tasteless)
to avoid the struct work overhead in some cases and not others.

Your_patch shows that it need not be ugly, and reduces the normal
waste to 16 bytes per msq, sema, shmseg.  These are not struct pages,
that amount will often be wasted by cacheline alignment too, so I'm
not going to get into a fight over it.

I think your patch looks quite nice - apart from the subtle hacky
fragile tasteless void *data[0]; but if something like that is
necessary, I'm sure someone can come up with a better there.

If Andrew or Mingming likes it and wants to replace the mempool
solution by yours, I'm not going to object (but if that is done,
remove struct rcu_ipc_free from ipc/util.h, and move its includes
of rcupdate.h and workqueue.h to ipc/util.c).  If they prefer the
mempool method, that's fine with me too.

But I wish you'd introduced your patch as "Here, I think this is
a nicer way of doing it, and doesn't waste as much as you thought:
you don't have to bother with a mempool this way" instead of getting
so mysteriously upset about it, implying some idiot failure in the
mempool method which you've not yet revealed to us.

Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28  1:06 ` [PATCH]updated ipc lock patch Rusty Russell
  2002-10-28 14:21   ` Hugh Dickins
@ 2002-10-28 20:00   ` Dipankar Sarma
  2002-10-28 21:41     ` Rusty Russell
  2002-10-28 22:07     ` mingming cao
  1 sibling, 2 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-10-28 20:00 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Hugh Dickins, Mingming Cao, Andrew Morton, linux-kernel

Hi Rusty,

I am pathologically late in catching up lkml, so if I missed some
context here, I apologize in advance. I have just started looking
at mm6 ipc code and I want to point out a few things.

On Mon, Oct 28, 2002 at 02:20:04AM +0100, Rusty Russell wrote:
> Yes, nonsensical.  Firstly, it's in violation of the standard to fail
> IPC_RMID under random circumstances.  Secondly, failing to clean up is
> an unhandlable error, since you're quite possible in the failure path
> of the code already.  This is a well known issue.

I am not sure how Ming/Hugh's current IPC changes affect IPC_RMID.
It affects only when you are trying to add a new ipc. In fact,
since it is a *add* operation (grow_ary()), it seems ok to fail it if rcu_head
allocation fails. Feel free to correct me if I missed something here.
AFAICS, the rcu stuff doesn't affect any freeing other than the IPC
id array.

> Two oom kills.  Three oom kills.  Four oom kills.  Where's the bound
> here?
> 
> Our allocator behavior for GFP_KERNEL has changed several times.  Are
> you sure that it won't *ever* fail under other circomstances?
> 
> > Okay (I expect, didn't review it) for just the ids arrays, but too much
> > memory waste if we have to allocate for each msq, sema, shm: if there's
> > a better solution available.  mempool looks better to us.
> 
> It's a hacky, fragile and incorrect solution.  It's completely
> tasteless.

Yes, the mempool code is broken, but only because rcu_backup_pool
is created three times, one by each IPC mechanism init :-)

> > Not yet.  You seem to have had a bad experience with something like
> > this (the mempools or the RCU or the combination), and you're warning
> > us away without actually telling us what you found.
> 
> I *wrote* the RCU interface (though the implementation in 2.5 isn't
> mine).  I thought it was pretty clear how it was supposed to be used.
> Obviously, I was wrong.

Yes, we went through this a long time ago and the general model
is to embedd the rcu_head thereby allocating it at the time
of allocation of the RCU protected data. This increases the
probability of recovery from low-memory situation as compared
to having to allocte during freeing.

That said, it seems that Ming/Hugh's patch does allocate
the rcu_head at the time of *growing* the array. It is just
that they allocate it for the freeing array rather than the
allocated array. I don't see how this is semantically different
from clubbing the two allocations other than the fact that
smaller number of allocation calls would likely reduce the
likelyhood of allocation failures.

> Patch below is against Mingming's mm4 release.  Compiles, untested.
> Rusty.

Yes, this is the typical RCU model, except that in this case (IPC),
I am not quite sure if it is in effect that different from what Ming/Hugh
have done.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28 20:00   ` Dipankar Sarma
@ 2002-10-28 21:41     ` Rusty Russell
  2002-10-29  6:11       ` Dipankar Sarma
  2002-10-28 22:07     ` mingming cao
  1 sibling, 1 reply; 11+ messages in thread
From: Rusty Russell @ 2002-10-28 21:41 UTC (permalink / raw)
  To: dipankar; +Cc: Hugh Dickins, Mingming Cao, Andrew Morton, linux-kernel

In message <20021029013059.A13287@dikhow> you write:
> Hi Rusty,
> 
> I am pathologically late in catching up lkml, so if I missed some
> context here, I apologize in advance. I have just started looking
> at mm6 ipc code and I want to point out a few things.

That's OK, I'm still 1500 behind 8(

	If all current uses are embedded, can we remove the "void
*arg" and reduce the size of struct rcu_head by 25%?  Users can always
embed it in their own struct which has a "void *arg", but if that's
the uncommon case, it'd be nice to slim it a little.

	It'd also be nice to change the double linked list to a single
too: as far as I can tell the only issue is the list_add_tail in
call_rcu(): how important is this ordering?  It can be done by keeping
a head as well as a tail pointer if required.

I'd be happy to prepare a patch, to avoid more complaints of bloat 8)

> That said, it seems that Ming/Hugh's patch does allocate
> the rcu_head at the time of *growing* the array. It is just
> that they allocate it for the freeing array rather than the
> allocated array. I don't see how this is semantically different
> from clubbing the two allocations other than the fact that
> smaller number of allocation calls would likely reduce the
> likelyhood of allocation failures.

We must be looking at different variants of the patch.  This one does:
IPC_RMID -> freeary() -> ipc_rcu_free -> kmalloc.

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28 14:21   ` Hugh Dickins
@ 2002-10-28 21:47     ` Rusty Russell
  2002-10-29  0:03       ` [RFC][PATCH]ipc rcu alloc/free patch - mm6 mingming cao
  2002-10-29  0:26       ` [PATCH]updated ipc lock patch Hugh Dickins
  0 siblings, 2 replies; 11+ messages in thread
From: Rusty Russell @ 2002-10-28 21:47 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: mingming cao, Andrew Morton, linux-kernel

In message <Pine.LNX.4.44.0210281311001.10156-100000@localhost.localdomain> you
 write:
> (Since you're bringing our discussion to linux-kernel,
> I've restored the full paragraph of my side of the argument above.)

Hi Hugh,

	Thanks for your complete reply.  I thought it was my fault
that it fell off lkml, and "corrected" it.  Sorry about that.

> > Two oom kills.  Three oom kills.  Four oom kills.  Where's the bound
> > here?
> 
> No bound to the number of possible OOM kills, but what problem is that?

Sorry, I'm obviously not making myself clear, since I've said this
three times now.

1) The memory is required for one whole RCU period (whether from
   kmalloc or the mempool).  This can be an almost arbitrarily long
   time (I've seen it take a good fraction of a second).

2) This is a problem, because other tasks could be OOM killed during
   that period, and could also try to use this mempool.

3) So, the size of the mempool which guarantees there will be enough?
   It's equal to the number of things you might free, which means
   you might as well allocate them together.

This is the correctness problem with the mempool IPC implementation.

> > Our allocator behavior for GFP_KERNEL has changed several times.  Are
> > you sure that it won't *ever* fail under other circomstances?
> 
> Well, I'll be surprised if we change kmalloc(GFP_KERNEL) to fail for
> reasons other than memory shortage ("insufficient privilege"? hmm,
> we'd change the names before that); though how hard it tries to decide
> if there's really a memory shortage certainly changes from one kernel
> to another.  But so long as it doesn't fail very often in normal
> circumstances, it's okay: the reserved mempool buffers back it up.

Once again, if ever returns NULL other than for tasks being OOM
killed, the problem gets wider.

It would be reasonable for the page allocator one day to say "hmm, at
the rate kswapd is going, it's going to take > 1 minute to fill this
allocation.  Let's fail it immediately instead".

This is the fragility problem with the mempool IPC implementation: you
are relying on the kmalloc implementation details which you are
explicitly not allowed to do (see previous "should kmalloc fail?"
threads).

> Your_patch shows that it need not be ugly, and reduces the normal
> waste to 16 bytes per msq, sema, shmseg.  These are not struct pages,
> that amount will often be wasted by cacheline alignment too, so I'm
> not going to get into a fight over it.

It could be reduced to 12 bytes by cutting out the "arg", and 8 bytes
by making it a single linked list.  I've posted this separately.

> I think your patch looks quite nice - apart from the subtle hacky
> fragile tasteless void *data[0]; but if something like that is
> necessary, I'm sure someone can come up with a better there.

Yes, it's tasteless, but not fragile.  Skipping it would be fragile
(it's unneccessary since struct rcu_head has a pointer in it).

I'm not sure that the alternative is nicer, either, though 8(

> But I wish you'd introduced your patch as "Here, I think this is
> a nicer way of doing it, and doesn't waste as much as you thought:
> you don't have to bother with a mempool this way" instead of getting
> so mysteriously upset about it, implying some idiot failure in the
> mempool method which you've not yet revealed to us.

You're right.  This took far more time than simply producing the patch
myself would have taken.

But despite that, I prefer to take the un-Viro-like approach of
believing that my fellow programmers cleverer than I am, and hence
require only a few subtle pointers when I manage to spot errors.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28 20:00   ` Dipankar Sarma
  2002-10-28 21:41     ` Rusty Russell
@ 2002-10-28 22:07     ` mingming cao
  2002-10-29  1:06       ` Rusty Russell
  1 sibling, 1 reply; 11+ messages in thread
From: mingming cao @ 2002-10-28 22:07 UTC (permalink / raw)
  To: dipankar, Rusty Russell; +Cc: Hugh Dickins, Andrew Morton, linux-kernel

Dipankar Sarma wrote:
> 
> On Mon, Oct 28, 2002 at 02:20:04AM +0100, Rusty Russell wrote:
> > Yes, nonsensical.  Firstly, it's in violation of the standard to fail
> > IPC_RMID under random circumstances.  Secondly, failing to clean up is
> > an unhandlable error, since you're quite possible in the failure path
> > of the code already.  This is a well known issue.
> 
> I am not sure how Ming/Hugh's current IPC changes affect IPC_RMID.
> It affects only when you are trying to add a new ipc. In fact,
> since it is a *add* operation (grow_ary()), it seems ok to fail it if rcu_head
> allocation fails. Feel free to correct me if I missed something here.
> AFAICS, the rcu stuff doesn't affect any freeing other than the IPC
> id array.
>

We extended the usage of RCU to IPC_RMID, to prevent ipc_lock()
returning an invalid IPC ID which has been removed by ioc_rmid.
 
> > It's a hacky, fragile and incorrect solution.  It's completely
> > tasteless.
> 
> Yes, the mempool code is broken, but only because rcu_backup_pool
> is created three times, one by each IPC mechanism init :-)
>

That's my bad, thanks for pointing this out. It's easy to fix if we
decide to go with mempool way.
 
> > Patch below is against Mingming's mm4 release.  Compiles, untested.
> > Rusty.
> 
> Yes, this is the typical RCU model, except that in this case (IPC),
> I am not quite sure if it is in effect that different from what Ming/Hugh
> have done.

Rusty's patch looks good to me. I would like to replace the mempool in
IPC with this typical RCU model. Rusty, if you like, I will make a patch
against mm6.  There need some cleanups. One thing is that ipc_alloc()
are called by other places(besides grow_ary()), and they don't need to
the RCU header structure. 

Mingming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH]ipc rcu alloc/free patch - mm6
  2002-10-28 21:47     ` Rusty Russell
@ 2002-10-29  0:03       ` mingming cao
  2002-10-29  0:26       ` [PATCH]updated ipc lock patch Hugh Dickins
  1 sibling, 0 replies; 11+ messages in thread
From: mingming cao @ 2002-10-29  0:03 UTC (permalink / raw)
  To: Rusty Russell, Andrew Morton; +Cc: Hugh Dickins, linux-kernel, dipankar

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

Andrew, Rusty,

Here is the patch which addresses RCU alloc problem araised by Rusty. 
It replaced the mempool  with Rusty's "RCU allocating RCU structure with
the object-to-be-freed together" solution.  Patch is for 2.5.44-mm6,
compiled and tested.

Please review and apply if you like.

Mingming
--------------------------------------------------------------------------------
msg.c  |    2 -
sem.c  |    6 +--
shm.c  |    2 -
util.c |  104
++++++++++++++++++++++++++++++++++-------------------------------
util.h |   23 ++++++++++----
5 files changed, 77 insertions(+), 60 deletions(-)

[-- Attachment #2: mm6-ipc.patch --]
[-- Type: text/plain, Size: 6990 bytes --]

diff -urN 2544-mm6/ipc/msg.c 2544-mm6-ipc/ipc/msg.c
--- 2544-mm6/ipc/msg.c	Mon Oct 28 09:51:20 2002
+++ 2544-mm6-ipc/ipc/msg.c	Mon Oct 28 09:31:41 2002
@@ -93,7 +93,7 @@
 	int retval;
 	struct msg_queue *msq;
 
-	msq  = (struct msg_queue *) kmalloc (sizeof (*msq), GFP_KERNEL);
+	msq  = (struct msg_queue *) ipc_rcu_alloc (sizeof (*msq));
 	if (!msq) 
 		return -ENOMEM;
 
diff -urN 2544-mm6/ipc/sem.c 2544-mm6-ipc/ipc/sem.c
--- 2544-mm6/ipc/sem.c	Mon Oct 28 09:51:20 2002
+++ 2544-mm6-ipc/ipc/sem.c	Mon Oct 28 09:31:41 2002
@@ -126,7 +126,7 @@
 		return -ENOSPC;
 
 	size = sizeof (*sma) + nsems * sizeof (struct sem);
-	sma = (struct sem_array *) ipc_alloc(size);
+	sma = (struct sem_array *) ipc_rcu_alloc(size);
 	if (!sma) {
 		return -ENOMEM;
 	}
@@ -138,14 +138,14 @@
 	sma->sem_perm.security = NULL;
 	retval = security_ops->sem_alloc_security(sma);
 	if (retval) {
-		ipc_free(sma, size);
+		ipc_rcu_free(sma, size);
 		return retval;
 	}
 
 	id = ipc_addid(&sem_ids, &sma->sem_perm, sc_semmni);
 	if(id == -1) {
 		security_ops->sem_free_security(sma);
-		ipc_free(sma, size);
+		ipc_rcu_free(sma, size);
 		return -ENOSPC;
 	}
 	used_sems += nsems;
diff -urN 2544-mm6/ipc/shm.c 2544-mm6-ipc/ipc/shm.c
--- 2544-mm6/ipc/shm.c	Mon Oct 28 09:51:20 2002
+++ 2544-mm6-ipc/ipc/shm.c	Mon Oct 28 09:31:41 2002
@@ -180,7 +180,7 @@
 	if (shm_tot + numpages >= shm_ctlall)
 		return -ENOSPC;
 
-	shp = (struct shmid_kernel *) kmalloc (sizeof (*shp), GFP_USER);
+	shp = (struct shmid_kernel *) ipc_rcu_alloc (sizeof (*shp));
 	if (!shp)
 		return -ENOMEM;
 
diff -urN 2544-mm6/ipc/util.c 2544-mm6-ipc/ipc/util.c
--- 2544-mm6/ipc/util.c	Mon Oct 28 09:51:20 2002
+++ 2544-mm6-ipc/ipc/util.c	Mon Oct 28 09:38:52 2002
@@ -22,26 +22,11 @@
 #include <linux/slab.h>
 #include <linux/highuid.h>
 #include <linux/security.h>
-#include <linux/mempool.h>
 
 #if defined(CONFIG_SYSVIPC)
 
 #include "util.h"
 
-static mempool_t* rcu_backup_pool;
-
-/* alloc and free function for rcu backup mempool */
-
-static void *alloc_ipc_rcu(int gfp_mask, void *pool_data)
-{
-	return kmalloc(sizeof(struct rcu_ipc_free), gfp_mask);
-}
-
-static void free_ipc_rcu(void* arg, void *pool_data)
-{
-	kfree(arg);
-}
-
 /**
  *	ipc_init	-	initialise IPC subsystem
  *
@@ -86,7 +71,7 @@
 		 	ids->seq_max = seq_limit;
 	}
 
-	ids->entries = ipc_alloc(sizeof(struct ipc_id)*size);
+	ids->entries = ipc_rcu_alloc(sizeof(struct ipc_id)*size);
 
 	if(ids->entries == NULL) {
 		printk(KERN_ERR "ipc_init_ids() failed, ipc service disabled.\n");
@@ -94,13 +79,6 @@
 	}
 	for(i=0;i<ids->size;i++)
 		ids->entries[i].p = NULL;
-
-	/* create a mempool in case normal kmalloc failed */
-	rcu_backup_pool = mempool_create(MAX_RCU_BACKUPS, 
-					alloc_ipc_rcu, free_ipc_rcu, NULL);
-	
-	if (rcu_backup_pool == NULL)
-		panic("ipc_init_ids() failed\n");
 }
 
 /**
@@ -128,6 +106,14 @@
 	return -1;
 }
 
+static inline int use_vmalloc(int size)
+{
+	/* Too big for a single page? */
+	if (sizeof(struct ipc_rcu_kmalloc) + size > PAGE_SIZE)
+		return 1;
+	return 0;
+}
+
 /*
  * Requires ipc_ids.sem locked
  */
@@ -142,7 +128,7 @@
 	if(newsize <= ids->size)
 		return newsize;
 
-	new = ipc_alloc(sizeof(struct ipc_id)*newsize);
+	new = ipc_rcu_alloc(sizeof(struct ipc_id)*newsize);
 	if(new == NULL)
 		return ids->size;
 	memcpy(new, ids->entries, sizeof(struct ipc_id)*ids->size);
@@ -257,16 +243,15 @@
 		out = kmalloc(size, GFP_KERNEL);
 	return out;
 }
-
 /**
- *	ipc_free	-	free ipc space
+ *	ipc_free        -       free ipc space
  *	@ptr: pointer returned by ipc_alloc
  *	@size: size of block
  *
  *	Free a block created with ipc_alloc. The caller must know the size
  *	used in the allocation call.
  */
- 
+
 void ipc_free(void* ptr, int size)
 {
 	if(size > PAGE_SIZE)
@@ -275,39 +260,60 @@
 		kfree(ptr);
 }
 
-/* 
- * Since RCU callback function is called in bh,
- * we need to defer the vfree to schedule_work
+/**
+ *	ipc_rcu_alloc	-	allocate ipc and rcu space 
+ *	@size: size desired
+ *
+ *	Allocate memory for the rcu header structure +  the object.
+ *	Returns the pointer to the object.
+ *	NULL is returned if the allocation fails. 
  */
-static void ipc_free_scheduled(void* arg)
-{
-	struct rcu_ipc_free *a = arg;
-	vfree(a->ptr);
-	mempool_free(a, rcu_backup_pool);
-}
-
-static void ipc_free_callback(void* arg)
+ 
+void* ipc_rcu_alloc(int size)
 {
-	struct rcu_ipc_free *a = arg;
+	void* out;
 	/* 
-	 * if data is vmalloced, then we need to delay the free
+	 * We prepend the allocation with the rcu struct, and
+	 * workqueue if necessary (for vmalloc). 
 	 */
-	if (a->size > PAGE_SIZE) {
-		INIT_WORK(&a->work, ipc_free_scheduled, arg);
-		schedule_work(&a->work);
+	if (use_vmalloc(size)) {
+		out = vmalloc(sizeof(struct ipc_rcu_vmalloc) + size);
+		if (out) out += sizeof(struct ipc_rcu_vmalloc);
 	} else {
-		kfree(a->ptr);
-		mempool_free(a, rcu_backup_pool);
+		out = kmalloc(sizeof(struct ipc_rcu_kmalloc)+size, GFP_KERNEL);
+		if (out) out += sizeof(struct ipc_rcu_kmalloc);
 	}
+
+	return out;
+}
+
+/**
+ *	ipc_schedule_free	- free ipc + rcu space
+ * 
+ * Since RCU callback function is called in bh,
+ * we need to defer the vfree to schedule_work
+ */
+static void ipc_schedule_free(void* arg)
+{
+	struct ipc_rcu_vmalloc *free = arg;
+
+	INIT_WORK(&free->work, vfree, free);
+	schedule_work(&free->work);
 }
 
 void ipc_rcu_free(void* ptr, int size)
 {
-	struct rcu_ipc_free* arg = mempool_alloc(rcu_backup_pool, GFP_KERNEL);
+	if (use_vmalloc(size)) {
+		struct ipc_rcu_vmalloc *free;
+		free = ptr - sizeof(*free);
+		call_rcu(&free->rcu, ipc_schedule_free, free);
+	} else {
+		struct ipc_rcu_kmalloc *free;
+		free = ptr - sizeof(*free);
+		/* kfree takes a "const void *" so gcc warns.  So we cast. */
+		call_rcu(&free->rcu, (void (*)(void *))kfree, free);
+	}
 
-	arg->ptr = ptr;
-	arg->size = size;
-	call_rcu(&arg->rcu_head, ipc_free_callback, arg);
 }
 
 /**
diff -urN 2544-mm6/ipc/util.h 2544-mm6-ipc/ipc/util.h
--- 2544-mm6/ipc/util.h	Mon Oct 28 09:51:20 2002
+++ 2544-mm6-ipc/ipc/util.h	Mon Oct 28 09:37:43 2002
@@ -9,17 +9,24 @@
 
 #define USHRT_MAX 0xffff
 #define SEQ_MULTIPLIER	(IPCMNI)
-#define MAX_RCU_BACKUPS	4	/*max # of elements in rcu_backup_pool*/
 
 void sem_init (void);
 void msg_init (void);
 void shm_init (void);
 
-struct rcu_ipc_free {
-	struct rcu_head		rcu_head;
-	void 			*ptr;
-	int 			size;
-	struct work_struct	work;
+struct ipc_rcu_kmalloc
+{
+	struct rcu_head rcu;
+	/* "void *" makes sure alignment of following data is sane. */
+	void *data[0];
+};
+
+struct ipc_rcu_vmalloc
+{
+	struct rcu_head rcu;
+	struct work_struct work;
+	/* "void *" makes sure alignment of following data is sane. */
+	void *data[0];
 };
 
 struct ipc_ids {
@@ -52,6 +59,10 @@
  */
 void* ipc_alloc(int size);
 void ipc_free(void* ptr, int size);
+/* for allocation that need to be freed by RCU
+ * both function can sleep
+ */
+void* ipc_rcu_alloc(int size);
 void ipc_rcu_free(void* arg, int size);
 
 struct kern_ipc_perm* ipc_get(struct ipc_ids* ids, int id);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28 21:47     ` Rusty Russell
  2002-10-29  0:03       ` [RFC][PATCH]ipc rcu alloc/free patch - mm6 mingming cao
@ 2002-10-29  0:26       ` Hugh Dickins
  2002-10-29  2:51         ` Rusty Russell
  1 sibling, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2002-10-29  0:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mingming cao, Andrew Morton, linux-kernel

On Tue, 29 Oct 2002, Rusty Russell wrote:
> > 
> > No bound to the number of possible OOM kills, but what problem is that?
> 
> Sorry, I'm obviously not making myself clear, since I've said this
> three times now.
> 
> 1) The memory is required for one whole RCU period (whether from
>    kmalloc or the mempool).  This can be an almost arbitrarily long
>    time (I've seen it take a good fraction of a second).

That's a very short time compared with an OOMing thrash: no worries there.

> 2) This is a problem, because other tasks could be OOM killed during
>    that period, and could also try to use this mempool.

They'll try to use the mempool, maybe some will be allowed to wait
for their kmalloc(GFP_KERNEL) memory, and others will be PF_MEMDIEd and
proceed to take a reserved mempool buffer, and others will be PF_MEMDIEd
and have to wait for a reserved mempool buffer.  Which will be released
to them in due course.  No worries there.

> 3) So, the size of the mempool which guarantees there will be enough?
>    It's equal to the number of things you might free, which means
>    you might as well allocate them together.

No, they take their turns.  It's sure not as efficient as each getting
a kmalloc'ed buffer immediately, but its failures will be rare.  And
it doesn't matter whether the failures only occur when heading for
OOM-kill or not: we just don't want failure to be the common case.
If kmalloc evolves into something that normally fails half the time,
well, I'd think that'd be called a bug.

> This is the correctness problem with the mempool IPC implementation.

No.  There may be other situations which might need at least
NR_CPUS reserved mempool buffer to avoid deadlock, but that's not
the case here.  Looks like mempool will be superseded as you wish
in the IPC context, fine; but I do think you need to take a look
at mempool_alloc: it's a different beast from what you suppose.

Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28 22:07     ` mingming cao
@ 2002-10-29  1:06       ` Rusty Russell
  0 siblings, 0 replies; 11+ messages in thread
From: Rusty Russell @ 2002-10-29  1:06 UTC (permalink / raw)
  To: cmm; +Cc: dipankar, Hugh Dickins, Andrew Morton, linux-kernel

In message <3DBDB51B.84F97EC1@us.ibm.com> you write:
> > Yes, this is the typical RCU model, except that in this case (IPC),
> > I am not quite sure if it is in effect that different from what Ming/Hugh
> > have done.
> 
> Rusty's patch looks good to me. I would like to replace the mempool in
> IPC with this typical RCU model. Rusty, if you like, I will make a patch
> against mm6.  There need some cleanups. One thing is that ipc_alloc()
> are called by other places(besides grow_ary()), and they don't need to
> the RCU header structure. 

Yes, I noticed that, but I'm not sure it's worth separating
ipc_alloc() and ipc_rcu_alloc() for a couple of temporary allocations.

Anyway, glad you like the patch,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-29  0:26       ` [PATCH]updated ipc lock patch Hugh Dickins
@ 2002-10-29  2:51         ` Rusty Russell
  0 siblings, 0 replies; 11+ messages in thread
From: Rusty Russell @ 2002-10-29  2:51 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: mingming cao, Andrew Morton, linux-kernel

In message <Pine.LNX.4.44.0210282357450.1315-100000@localhost.localdomain> you 
write:
> > 2) This is a problem, because other tasks could be OOM killed during
> >    that period, and could also try to use this mempool.
> 
> They'll try to use the mempool, maybe some will be allowed to wait
> for their kmalloc(GFP_KERNEL) memory, and others will be PF_MEMDIEd and
> proceed to take a reserved mempool buffer, and others will be PF_MEMDIEd
> and have to wait for a reserved mempool buffer.  Which will be released
> to them in due course.  No worries there.

Oh.

You are (of course) correct.  Thankyou for your patience.  Your
solution is elegant and correct.

Feeling dimwitted,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH]updated ipc lock patch
  2002-10-28 21:41     ` Rusty Russell
@ 2002-10-29  6:11       ` Dipankar Sarma
  0 siblings, 0 replies; 11+ messages in thread
From: Dipankar Sarma @ 2002-10-29  6:11 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Hugh Dickins, Mingming Cao, Andrew Morton, linux-kernel

On Tue, Oct 29, 2002 at 08:41:19AM +1100, Rusty Russell wrote:
> 	If all current uses are embedded, can we remove the "void
> *arg" and reduce the size of struct rcu_head by 25%?  Users can always
> embed it in their own struct which has a "void *arg", but if that's
> the uncommon case, it'd be nice to slim it a little.

All current cases are not embedded, synchronize_kernel() needs
"arg" :-)

> 
> 	It'd also be nice to change the double linked list to a single
> too: as far as I can tell the only issue is the list_add_tail in
> call_rcu(): how important is this ordering?  It can be done by keeping
> a head as well as a tail pointer if required.

I can't see how the ordering of the RCU updates matter, so we can 
trivially change things internally without affecting the interface. 

That said, I disagree about the bloat issue, I don't see a problem there, 
atleast not yet. All the uses that I have seen so far, the additional "prev"
pointer is a very small fraction of the total memory allocated for
the objects. And it is certainly not an issue with IPC - just look
at the values for SHMMNI, SEMMNI etc.

> We must be looking at different variants of the patch.  This one does:
> IPC_RMID -> freeary() -> ipc_rcu_free -> kmalloc.
> 

Grr... I thought Mingming's patch modified only IPC common code
and was looking at the mm6 tree directly. My earlier suggestion
is valid only if RCU head allocation is limited to grow_ary(). Otherwise,
rcu_head should be embedded.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-10-29  5:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.44.0210270748560.1704-100000@localhost.localdomain>
2002-10-28  1:06 ` [PATCH]updated ipc lock patch Rusty Russell
2002-10-28 14:21   ` Hugh Dickins
2002-10-28 21:47     ` Rusty Russell
2002-10-29  0:03       ` [RFC][PATCH]ipc rcu alloc/free patch - mm6 mingming cao
2002-10-29  0:26       ` [PATCH]updated ipc lock patch Hugh Dickins
2002-10-29  2:51         ` Rusty Russell
2002-10-28 20:00   ` Dipankar Sarma
2002-10-28 21:41     ` Rusty Russell
2002-10-29  6:11       ` Dipankar Sarma
2002-10-28 22:07     ` mingming cao
2002-10-29  1:06       ` Rusty Russell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.