netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] ipmr: delete redundant variable
@ 2008-07-23  1:45 Wang Chen
  2008-07-23  8:03 ` Ingo Oeser
  0 siblings, 1 reply; 7+ messages in thread
From: Wang Chen @ 2008-07-23  1:45 UTC (permalink / raw)
  To: David S. Miller; +Cc: NETDEV

*v can be removed as this patch showing.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
---
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index c519b8d..6e715c7 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1121,7 +1121,6 @@ int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)
 static int ipmr_device_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
 	struct net_device *dev = ptr;
-	struct vif_device *v;
 	int ct;
 
 	if (!net_eq(dev_net(dev), &init_net))
@@ -1129,9 +1128,9 @@ static int ipmr_device_event(struct notifier_block *this, unsigned long event, v
 
 	if (event != NETDEV_UNREGISTER)
 		return NOTIFY_DONE;
-	v=&vif_table[0];
-	for (ct=0;ct<maxvif;ct++,v++) {
-		if (v->dev==dev)
+
+	for (ct = 0; ct < maxvif; ct++) {
+		if (vif_table[ct].dev == dev)
 			vif_delete(ct, 1);
 	}
 	return NOTIFY_DONE;




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ipmr: delete redundant variable
  2008-07-23  1:45 [PATCH 1/2] ipmr: delete redundant variable Wang Chen
@ 2008-07-23  8:03 ` Ingo Oeser
  2008-07-23  9:35   ` Wang Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Oeser @ 2008-07-23  8:03 UTC (permalink / raw)
  To: Wang Chen; +Cc: David S. Miller, NETDEV

Hi Wang Chen,

Wang Chen schrieb:
> *v can be removed as this patch showing.

You are right, but did you check the resulting asm?
 
> Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
> ---
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index c519b8d..6e715c7 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -1129,9 +1128,9 @@ static int ipmr_device_event(struct notifier_block *this, unsigned long event, v
>  
>  	if (event != NETDEV_UNREGISTER)
>  		return NOTIFY_DONE;
> -	v=&vif_table[0];
> -	for (ct=0;ct<maxvif;ct++,v++) {
> -		if (v->dev==dev)

This is ptr += sizeof(vif_table[0])

> +
> +	for (ct = 0; ct < maxvif; ct++) {
> +		if (vif_table[ct].dev == dev)

This is ptr + ct * sizeof(vif_table[0])

On architectures, where the second address variant is
not supported, it spills a register with the multiply/shift.

But the second variant could be easily auto vectorized, 
if we had no if.

So just check the asm on a CISC and a RISC architecture 
with a cross compile, before you transform these patterns.

Maybe GCC even transform one into the other these days :-)


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ipmr: delete redundant variable
  2008-07-23  8:03 ` Ingo Oeser
@ 2008-07-23  9:35   ` Wang Chen
  2008-07-23 12:05     ` Ingo Oeser
  0 siblings, 1 reply; 7+ messages in thread
From: Wang Chen @ 2008-07-23  9:35 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: David S. Miller, NETDEV

Ingo Oeser said the following on 2008-7-23 16:03:
> Hi Wang Chen,
> 
> Wang Chen schrieb:
>> *v can be removed as this patch showing.
> 
> You are right, but did you check the resulting asm?
>  
>> Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
>> ---
>> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
>> index c519b8d..6e715c7 100644
>> --- a/net/ipv4/ipmr.c
>> +++ b/net/ipv4/ipmr.c
>> @@ -1129,9 +1128,9 @@ static int ipmr_device_event(struct notifier_block *this, unsigned long event, v
>>  
>>  	if (event != NETDEV_UNREGISTER)
>>  		return NOTIFY_DONE;
>> -	v=&vif_table[0];
>> -	for (ct=0;ct<maxvif;ct++,v++) {
>> -		if (v->dev==dev)
> 
> This is ptr += sizeof(vif_table[0])
> 
>> +
>> +	for (ct = 0; ct < maxvif; ct++) {
>> +		if (vif_table[ct].dev == dev)
> 
> This is ptr + ct * sizeof(vif_table[0])
> 
> On architectures, where the second address variant is
> not supported, it spills a register with the multiply/shift.
> 

But "accessing entry of table by index" is always allowed,
right?
If the complier makes such pointer which spills a register with
the multiply/shift, the simple code as following is bug too:
i = table[100].field;
But it shouldn't, right :)

> But the second variant could be easily auto vectorized, 
> if we had no if.
> 
> So just check the asm on a CISC and a RISC architecture 
> with a cross compile, before you transform these patterns.
> 
> Maybe GCC even transform one into the other these days :-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ipmr: delete redundant variable
  2008-07-23  9:35   ` Wang Chen
@ 2008-07-23 12:05     ` Ingo Oeser
  2008-07-23 15:16       ` Wang Chen
  2008-07-24  7:37       ` Wang Chen
  0 siblings, 2 replies; 7+ messages in thread
From: Ingo Oeser @ 2008-07-23 12:05 UTC (permalink / raw)
  To: Wang Chen; +Cc: David S. Miller, NETDEV

Hi Wand Chen,

Wang Chen schrieb:
> But "accessing entry of table by index" is always allowed,
> right?
> If the complier makes such pointer which spills a register with
> the multiply/shift, the simple code as following is bug too:
> i = table[100].field;
> But it shouldn't, right :)

I'm NOT telling you, that your transformation is introducing a BUG.
It is semantically perfectly equivalent.

I'm trying to tell you, that it might not led to the same or better 
performance and might thus be not worth it.

But please check the generated assembly yourself on a CISC and RISC
machine to get an idea of the effects. It will be a nice learning 
experience I enjoyed myself already.


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ipmr: delete redundant variable
  2008-07-23 12:05     ` Ingo Oeser
@ 2008-07-23 15:16       ` Wang Chen
  2008-07-24  7:37       ` Wang Chen
  1 sibling, 0 replies; 7+ messages in thread
From: Wang Chen @ 2008-07-23 15:16 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: David S. Miller, NETDEV

Ingo Oeser said the following on 2008-7-23 20:05:
> Hi Wand Chen,
> 
> Wang Chen schrieb:
>> But "accessing entry of table by index" is always allowed,
>> right?
>> If the complier makes such pointer which spills a register with
>> the multiply/shift, the simple code as following is bug too:
>> i = table[100].field;
>> But it shouldn't, right :)
> 
> I'm NOT telling you, that your transformation is introducing a BUG.
> It is semantically perfectly equivalent.
> 
> I'm trying to tell you, that it might not led to the same or better 
> performance and might thus be not worth it.
> 

Agree. I also think the accessing by index might lead to worse performance.
But in this code, we don't care performance, since it only be called when
device is unregistered. :)

> But please check the generated assembly yourself on a CISC and RISC
> machine to get an idea of the effects. It will be a nice learning 
> experience I enjoyed myself already.
> 

Sure. I am doing it.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ipmr: delete redundant variable
  2008-07-23 12:05     ` Ingo Oeser
  2008-07-23 15:16       ` Wang Chen
@ 2008-07-24  7:37       ` Wang Chen
  2008-07-25 17:36         ` Ingo Oeser
  1 sibling, 1 reply; 7+ messages in thread
From: Wang Chen @ 2008-07-24  7:37 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: David S. Miller, NETDEV

Ingo Oeser said the following on 2008-7-23 20:05:
> But please check the generated assembly yourself on a CISC and RISC
> machine to get an idea of the effects. It will be a nice learning 
> experience I enjoyed myself already.
> 

I did the experiment.

I used the following C code to compare which approach is better and get
a result that two are same on performance.

----main.c
#define maxvif 32

struct vif {
	int *dev;
	unsigned long bytes_in, bytyes_out;
	unsigned long pkt_in, pkt_out;
	unsigned long rate_limit;
	unsigned char threshhold;
	unsigned short flags;
	int	local, remote;
	int	link;
};

struct vif vif_table[maxvif];

int main()
{
	struct vif *v;
	int ct;

	v = &vif_table[0];
	for (ct = 0; ct < maxvif; ct++, v++)
		if(v->link==1)
			break;
	return 0;
}
---

---main2.c
#define maxvif 32

struct vif {
	int *dev;
	unsigned long bytes_in, bytyes_out;
	unsigned long pkt_in, pkt_out;
	unsigned long rate_limit;
	unsigned char threshhold;
	unsigned short flags;
	int	local, remote;
	int	link;
};

struct vif vif_table[maxvif];

int main()
{
	struct vif *v;
	int ct;

	v = &vif_table[0];
	for (ct = 0; ct < maxvif; ct++)
		if(vif_table[ct].link==1)
			break;
	return 0;
}
---

Use gcc -S -O2 to compile:
---x86 asm main.s
	.file	"main.c"
	.text
	.p2align 4,,15
.globl main
	.type	main, @function
main:
	leal	4(%esp), %ecx
	andl	$-16, %esp
	pushl	-4(%ecx)
	movl	$vif_table, %eax
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ecx
	jmp	.L2
	.p2align 4,,7
.L8:
	cmpl	$vif_table+1240, %eax
	je	.L3
	addl	$40, %eax
.L2:
	cmpl	$1, 36(%eax)
	jne	.L8
.L3:
	popl	%ecx
	xorl	%eax, %eax
	popl	%ebp
	leal	-4(%ecx), %esp
	ret
	.size	main, .-main
	.comm	vif_table,1280,32
	.ident	"GCC: (GNU) 4.1.2 20070115 (prerelease) (SUSE Linux)"
	.section	.note.GNU-stack,"",@progbits
---

---x86 asm main2.s
	.file	"main2.c"
	.text
	.p2align 4,,15
.globl main
	.type	main, @function
main:
	leal	4(%esp), %ecx
	andl	$-16, %esp
	pushl	-4(%ecx)
	xorl	%eax, %eax
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ecx
	jmp	.L2
	.p2align 4,,7
.L8:
	addl	$40, %eax
	cmpl	$1280, %eax
	je	.L3
.L2:
	cmpl	$1, vif_table+36(%eax)
	jne	.L8
.L3:
	popl	%ecx
	xorl	%eax, %eax
	popl	%ebp
	leal	-4(%ecx), %esp
	ret
	.size	main, .-main
	.comm	vif_table,1280,32
	.ident	"GCC: (GNU) 4.1.2 20070115 (prerelease) (SUSE Linux)"
	.section	.note.GNU-stack,"",@progbits
---

In loop area, main.s and main2.s have the following difference:
main.s :
	cmpl	$vif_table+1240, %eax
	cmpl	$1, 36(%eax)
main2.s:
	cmpl	$1280, %eax
	cmpl	$1, vif_table+36(%eax)
The difference can't cause different performance.

OK. Here is the asm on SPARC(not cross compile)
---main.s
                       	.global main                

			main:
/* 000000	  21 */		sethi	%hi(vif_table),%o5
/* 0x0004	  22 */		or	%g0,0,%o4
/* 0x0008	  21 */		add	%o5,%lo(vif_table),%o3
/* 0x000c	  23 */		ld	[%o3+36],%o5

			.L900000106:
/* 0x0010	  23 */		cmp	%o5,1
/* 0x0014	     */		be,pn	%icc,.L77000028
/* 0x0018	  22 */		add	%o4,1,%o4

			.L77000025:
/* 0x001c	  22 */		add	%o3,40,%o3
/* 0x0020	     */		cmp	%o4,32
/* 0x0024	     */		bl,a,pt	%icc,.L900000106
/* 0x0028	  23 */		ld	[%o3+36],%o5

			.L77000028:
/* 0x002c	  22 */		retl	! Result =  %o0
/* 0x0030	     */		or	%g0,0,%o0
/* 0x0034	   0 */		.type	main,2
/* 0x0034	   0 */		.size	main,(.-main)
/* 0x0034	   0 */		.global	__fsr_init_value
/* 0x0034	     */		 __fsr_init_value=0
---

---main2.s
                       	.global main   

			main:
/* 000000	  22 */		sethi	%hi(vif_table+36),%o5
/* 0x0004	     */		or	%g0,0,%o3
/* 0x0008	     */		add	%o5,%lo(vif_table+36),%o4
/* 0x000c	  23 */		ld	[%o5+%lo(vif_table+36)],%o5

			.L900000106:
/* 0x0010	  23 */		cmp	%o5,1
/* 0x0014	     */		be,pn	%icc,.L77000028
/* 0x0018	  22 */		add	%o4,40,%o4

			.L77000025:
/* 0x001c	  22 */		add	%o3,1,%o3
/* 0x0020	     */		cmp	%o3,32
/* 0x0024	     */		bl,a,pt	%icc,.L900000106
/* 0x0028	  23 */		ld	[%o4],%o5

			.L77000028:
/* 0x002c	  22 */		retl	! Result =  %o0
/* 0x0030	     */		or	%g0,0,%o0
/* 0x0034	   0 */		.type	main,2
/* 0x0034	   0 */		.size	main,(.-main)
/* 0x0034	   0 */		.global	__fsr_init_value
/* 0x0034	     */		 __fsr_init_value=0
---

In loop area, they are both ptr+sizeof(struct).

Now, we can get a conclusion that current compiler can do optimize the index accessing.
:)

Ingo, if you have any different opinion, it will be appreciated that you can share. :)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ipmr: delete redundant variable
  2008-07-24  7:37       ` Wang Chen
@ 2008-07-25 17:36         ` Ingo Oeser
  0 siblings, 0 replies; 7+ messages in thread
From: Ingo Oeser @ 2008-07-25 17:36 UTC (permalink / raw)
  To: Wang Chen; +Cc: David S. Miller, NETDEV

Hi Wang Chen,

Wang Chen schrieb:
> Ingo Oeser said the following on 2008-7-23 20:05:
> > But please check the generated assembly yourself on a CISC and RISC
> > machine to get an idea of the effects. It will be a nice learning 
> > experience I enjoyed myself already.
> > 
> 
> I did the experiment.
[..]
> In loop area, they are both ptr+sizeof(struct).
> 
> Now, we can get a conclusion that current compiler can do optimize the index accessing.
> :)
> 
> Ingo, if you have any different opinion, it will be appreciated that you can share. :)

Great! Compilers improved a lot here :-)

Many thanks for doing this experiment. 

Now you and others can anyone who is is questioning
this fact to your experiment and take it as a reference for similiar changes.

That is a great help for the community, I think!


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-07-25 17:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-23  1:45 [PATCH 1/2] ipmr: delete redundant variable Wang Chen
2008-07-23  8:03 ` Ingo Oeser
2008-07-23  9:35   ` Wang Chen
2008-07-23 12:05     ` Ingo Oeser
2008-07-23 15:16       ` Wang Chen
2008-07-24  7:37       ` Wang Chen
2008-07-25 17:36         ` Ingo Oeser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).