* X86_64 and X86_32 bit performance difference [Revisited]
@ 2006-01-09 6:29 Nauman Tahir
2006-01-09 7:51 ` Arjan van de Ven
2006-01-09 18:27 ` Andi Kleen
0 siblings, 2 replies; 7+ messages in thread
From: Nauman Tahir @ 2006-01-09 6:29 UTC (permalink / raw)
To: linux-kernel; +Cc: kernelnewbies
Hello All
I have posted this problem before. Now mailing again after testing as
recommeded in previous replys.
My configuration is:
Hardware:
HP Proliant DL145 (2 x AMD Optaron 144)
14 GB RAM
OS:
FC 4
Kernel
2.6.xx
As suggested by some friend, I compiled same kermel with maximum
possible common configuration options both on 32 and 64 bit. Tested my
deriver and got the same result.
Let me explain in detail whats going on.
I have a block device driver which uses my RAMDISK for caching the
data for some Target disk.
I have implemented two simple caching policies in it. I am running
IOTEST to see the IO rate of my driver. My RAMDISK differs for 32 and
64 bit versions. 32 bit version uses kmap family to read/write data
to/from memory while 64 bit version uses __va function call to get the
virtual address directly to avoid ioremap which sleeps and slows down
the IO rate considerably.RAMDISK individually gives very high IO rate
with IOTEST but perormance with my driver gets about one fourth. This
only happens when I run the whole thing on X86_64 bit compiled kernel.
Things works well on 32 bit version. Driver for both versions is same.
I can also not figure out what kernel configuration option is making
the difference if there is any.
My code does not seems to have portablility issues. Like calculations
are based on unsigned long. There are few threads involved based on
kernel_thread as used in MD driver.
Any ideas whats is the cause of performance difference? what areas to
look for ??
Nauman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited]
2006-01-09 6:29 X86_64 and X86_32 bit performance difference [Revisited] Nauman Tahir
@ 2006-01-09 7:51 ` Arjan van de Ven
2006-01-10 10:49 ` Nauman Tahir
2006-01-09 18:27 ` Andi Kleen
1 sibling, 1 reply; 7+ messages in thread
From: Arjan van de Ven @ 2006-01-09 7:51 UTC (permalink / raw)
To: Nauman Tahir; +Cc: linux-kernel, kernelnewbies
On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> Hello All
> I have posted this problem before. Now mailing again after testing as
> recommeded in previous replys.
> My configuration is:
>
> Hardware:
> HP Proliant DL145 (2 x AMD Optaron 144)
> 14 GB RAM
>
> OS:
> FC 4
>
> Kernel
> 2.6.xx
You *STILL* have not posted the URL to your source code.
How is anyone supposed to help you without that?????
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited]
2006-01-09 6:29 X86_64 and X86_32 bit performance difference [Revisited] Nauman Tahir
2006-01-09 7:51 ` Arjan van de Ven
@ 2006-01-09 18:27 ` Andi Kleen
2006-01-10 10:50 ` Nauman Tahir
1 sibling, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2006-01-09 18:27 UTC (permalink / raw)
To: Nauman Tahir; +Cc: kernelnewbies, linux-kernel
Nauman Tahir <nauman.tahir@gmail.com> writes:
> I have posted this problem before. Now mailing again after testing as
> recommeded in previous replys.
> My configuration is:
Most likely it's related to you misusing the PCI DMA API in some way.
Review Documentation/DMA-mapping.txt closely.
If that doesn't turn on the light try oprofile.
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited]
2006-01-09 7:51 ` Arjan van de Ven
@ 2006-01-10 10:49 ` Nauman Tahir
2006-01-10 10:53 ` Nauman Tahir
2006-01-10 21:14 ` Arjan van de Ven
0 siblings, 2 replies; 7+ messages in thread
From: Nauman Tahir @ 2006-01-10 10:49 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, kernelnewbies
On 1/9/06, Arjan van de Ven <arjan@infradead.org> wrote:
> On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> > Hello All
> > I have posted this problem before. Now mailing again after testing as
> > recommeded in previous replys.
> > My configuration is:
> >
> > Hardware:
> > HP Proliant DL145 (2 x AMD Optaron 144)
> > 14 GB RAM
> >
> > OS:
> > FC 4
> >
> > Kernel
> > 2.6.xx
>
> You *STILL* have not posted the URL to your source code.
> How is anyone supposed to help you without that?????
I have attached a file which I use as thread API. Complete code is
quiet large and also need proper description. which i would be posting
if needed.
I hope I make my problem clear: I repeat : same code is giving alot of
performance degradation on previously mentioned configuration. One
suspect is the thread library.
dts_thread_t *dts_register_thread(void (*run) (void *), const char
*name, void * private)
is the function to register my thread handler
void dts_wakeup_thread(dts_thread_t *thread)
is the function in the dts_thread.c which i use to run my thread.
all my thread handlers either
call generic_make_request some times for my RAMDISK and sometimes for
my Target device [SCSI DISK or local HDD partition]
OR
uses list.h
>
>
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited]
2006-01-09 18:27 ` Andi Kleen
@ 2006-01-10 10:50 ` Nauman Tahir
0 siblings, 0 replies; 7+ messages in thread
From: Nauman Tahir @ 2006-01-10 10:50 UTC (permalink / raw)
To: Andi Kleen; +Cc: kernelnewbies, linux-kernel
On 09 Jan 2006 19:27:20 +0100, Andi Kleen <ak@suse.de> wrote:
> Nauman Tahir <nauman.tahir@gmail.com> writes:
>
> > I have posted this problem before. Now mailing again after testing as
> > recommeded in previous replys.
> > My configuration is:
>
> Most likely it's related to you misusing the PCI DMA API in some way.
> Review Documentation/DMA-mapping.txt closely.
>
> If that doesn't turn on the light try oprofile.
what is oprofile???
>
> -Andi
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited]
2006-01-10 10:49 ` Nauman Tahir
@ 2006-01-10 10:53 ` Nauman Tahir
2006-01-10 21:14 ` Arjan van de Ven
1 sibling, 0 replies; 7+ messages in thread
From: Nauman Tahir @ 2006-01-10 10:53 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, kernelnewbies
[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]
On 1/10/06, Nauman Tahir <nauman.tahir@gmail.com> wrote:
> On 1/9/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> > > Hello All
> > > I have posted this problem before. Now mailing again after testing as
> > > recommeded in previous replys.
> > > My configuration is:
> > >
> > > Hardware:
> > > HP Proliant DL145 (2 x AMD Optaron 144)
> > > 14 GB RAM
> > >
> > > OS:
> > > FC 4
> > >
> > > Kernel
> > > 2.6.xx
> >
> > You *STILL* have not posted the URL to your source code.
> > How is anyone supposed to help you without that?????
>
> I have attached a file which I use as thread API. Complete code is
> quiet large and also need proper description. which i would be posting
> if needed.
> I hope I make my problem clear: I repeat : same code is giving alot of
> performance degradation on previously mentioned configuration. One
> suspect is the thread library.
>
>
> dts_thread_t *dts_register_thread(void (*run) (void *), const char
> *name, void * private)
>
> is the function to register my thread handler
>
> void dts_wakeup_thread(dts_thread_t *thread)
>
> is the function in the dts_thread.c which i use to run my thread.
>
> all my thread handlers either
> call generic_make_request some times for my RAMDISK and sometimes for
> my Target device [SCSI DISK or local HDD partition]
> OR
> uses list.h
>
>
>
> >
> >
> >
> >
> >
>
[-- Attachment #2: dts_thread.c --]
[-- Type: text/plain, Size: 2475 bytes --]
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/smp_lock.h>
#include <linux/mempool.h>
#include <linux/slab.h>
#include "../include/dts_thread.h"
#define THREAD_WAKEUP 0x01
extern void dts_set_bit(char * , int );
int dts_thread(void * arg)
{
dts_thread_t *thread = arg;
lock_kernel();
/*
* Detach thread
*/
daemonize(thread->name);
current->exit_signal = SIGCHLD;
allow_signal(SIGKILL);
thread->tsk = current;
unlock_kernel();
complete(thread->event);
while (thread->run) {
void (*run)(void *);
wait_event_interruptible(thread->wqueue,
test_bit(THREAD_WAKEUP, &thread->flags));
if (current->flags & PF_FREEZE)
refrigerator(PF_FREEZE);
clear_bit(THREAD_WAKEUP, &thread->flags);
run = thread->run;
if (run)
run(thread->private);
if (signal_pending(current))
flush_signals(current);
}
complete(thread->event);
return 0;
}
void dts_wakeup_thread(dts_thread_t *thread)
{
if (thread) {
dts_set_bit((char *)&thread->flags, THREAD_WAKEUP);
wake_up(&thread->wqueue);
}
else
printk("dts_wakeup_thread:.........thread is NULL\n");
}
dts_thread_t *dts_register_thread(void (*run) (void *), const char *name, void * private)
{
dts_thread_t *thread=NULL;
int ret;
struct completion event;
thread = (dts_thread_t *) kmalloc
(sizeof(dts_thread_t), GFP_KERNEL);
if (!thread)
return NULL;
memset(thread, 0, sizeof(dts_thread_t));
init_waitqueue_head(&thread->wqueue);
init_completion(&event);
thread->event = &event;
thread->run = run;
thread->name = name;
thread->private = private;
ret = kernel_thread(dts_thread, thread, 0);
if (ret < 0) {
printk("\ndts_register_thread:.......unable to register kernel thread\n");
kfree(thread);
return NULL;
}
wait_for_completion(&event);
// printk("Thread Allocated Successfully\n ");
return thread;
}
void dts_interrupt_thread(dts_thread_t *thread)
{
if (!thread->tsk) {
BUG();
return;
}
// dprintk("interrupting dts-thread pid %d\n", thread->tsk->pid);
send_sig(SIGKILL, thread->tsk, 1);
}
void dts_unregister_thread(dts_thread_t *thread)
{
struct completion event;
init_completion(&event);
thread->event = &event;
thread->run = NULL;
thread->name = NULL;
dts_interrupt_thread(thread);
wait_for_completion(&event);
kfree(thread);
}
EXPORT_SYMBOL(dts_wakeup_thread);
EXPORT_SYMBOL(dts_unregister_thread);
EXPORT_SYMBOL(dts_register_thread);
EXPORT_SYMBOL(dts_interrupt_thread);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited]
2006-01-10 10:49 ` Nauman Tahir
2006-01-10 10:53 ` Nauman Tahir
@ 2006-01-10 21:14 ` Arjan van de Ven
1 sibling, 0 replies; 7+ messages in thread
From: Arjan van de Ven @ 2006-01-10 21:14 UTC (permalink / raw)
To: Nauman Tahir; +Cc: linux-kernel, kernelnewbies
On Tue, 2006-01-10 at 02:49 -0800, Nauman Tahir wrote:
> On 1/9/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> > > Hello All
> > > I have posted this problem before. Now mailing again after testing as
> > > recommeded in previous replys.
> > > My configuration is:
> > >
> > > Hardware:
> > > HP Proliant DL145 (2 x AMD Optaron 144)
> > > 14 GB RAM
> > >
> > > OS:
> > > FC 4
> > >
> > > Kernel
> > > 2.6.xx
> >
> > You *STILL* have not posted the URL to your source code.
> > How is anyone supposed to help you without that?????
>
> I have attached a file which I use as thread API. Complete code is
> quiet large and also need proper description. which i would be posting
> if needed.
well you don't give any of the block layer code, I'd say more code is
needed. Just put all of it online somewhere and post the URL...
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-01-10 21:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-09 6:29 X86_64 and X86_32 bit performance difference [Revisited] Nauman Tahir
2006-01-09 7:51 ` Arjan van de Ven
2006-01-10 10:49 ` Nauman Tahir
2006-01-10 10:53 ` Nauman Tahir
2006-01-10 21:14 ` Arjan van de Ven
2006-01-09 18:27 ` Andi Kleen
2006-01-10 10:50 ` Nauman Tahir
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox