* X86_64 and X86_32 bit performance difference [Revisited] @ 2006-01-09 6:29 Nauman Tahir 2006-01-09 7:51 ` Arjan van de Ven 2006-01-09 18:27 ` Andi Kleen 0 siblings, 2 replies; 7+ messages in thread From: Nauman Tahir @ 2006-01-09 6:29 UTC (permalink / raw) To: linux-kernel; +Cc: kernelnewbies Hello All I have posted this problem before. Now mailing again after testing as recommeded in previous replys. My configuration is: Hardware: HP Proliant DL145 (2 x AMD Optaron 144) 14 GB RAM OS: FC 4 Kernel 2.6.xx As suggested by some friend, I compiled same kermel with maximum possible common configuration options both on 32 and 64 bit. Tested my deriver and got the same result. Let me explain in detail whats going on. I have a block device driver which uses my RAMDISK for caching the data for some Target disk. I have implemented two simple caching policies in it. I am running IOTEST to see the IO rate of my driver. My RAMDISK differs for 32 and 64 bit versions. 32 bit version uses kmap family to read/write data to/from memory while 64 bit version uses __va function call to get the virtual address directly to avoid ioremap which sleeps and slows down the IO rate considerably.RAMDISK individually gives very high IO rate with IOTEST but perormance with my driver gets about one fourth. This only happens when I run the whole thing on X86_64 bit compiled kernel. Things works well on 32 bit version. Driver for both versions is same. I can also not figure out what kernel configuration option is making the difference if there is any. My code does not seems to have portablility issues. Like calculations are based on unsigned long. There are few threads involved based on kernel_thread as used in MD driver. Any ideas whats is the cause of performance difference? what areas to look for ?? Nauman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited] 2006-01-09 6:29 X86_64 and X86_32 bit performance difference [Revisited] Nauman Tahir @ 2006-01-09 7:51 ` Arjan van de Ven 2006-01-10 10:49 ` Nauman Tahir 2006-01-09 18:27 ` Andi Kleen 1 sibling, 1 reply; 7+ messages in thread From: Arjan van de Ven @ 2006-01-09 7:51 UTC (permalink / raw) To: Nauman Tahir; +Cc: linux-kernel, kernelnewbies On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote: > Hello All > I have posted this problem before. Now mailing again after testing as > recommeded in previous replys. > My configuration is: > > Hardware: > HP Proliant DL145 (2 x AMD Optaron 144) > 14 GB RAM > > OS: > FC 4 > > Kernel > 2.6.xx You *STILL* have not posted the URL to your source code. How is anyone supposed to help you without that????? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited] 2006-01-09 7:51 ` Arjan van de Ven @ 2006-01-10 10:49 ` Nauman Tahir 2006-01-10 10:53 ` Nauman Tahir 2006-01-10 21:14 ` Arjan van de Ven 0 siblings, 2 replies; 7+ messages in thread From: Nauman Tahir @ 2006-01-10 10:49 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel, kernelnewbies On 1/9/06, Arjan van de Ven <arjan@infradead.org> wrote: > On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote: > > Hello All > > I have posted this problem before. Now mailing again after testing as > > recommeded in previous replys. > > My configuration is: > > > > Hardware: > > HP Proliant DL145 (2 x AMD Optaron 144) > > 14 GB RAM > > > > OS: > > FC 4 > > > > Kernel > > 2.6.xx > > You *STILL* have not posted the URL to your source code. > How is anyone supposed to help you without that????? I have attached a file which I use as thread API. Complete code is quiet large and also need proper description. which i would be posting if needed. I hope I make my problem clear: I repeat : same code is giving alot of performance degradation on previously mentioned configuration. One suspect is the thread library. dts_thread_t *dts_register_thread(void (*run) (void *), const char *name, void * private) is the function to register my thread handler void dts_wakeup_thread(dts_thread_t *thread) is the function in the dts_thread.c which i use to run my thread. all my thread handlers either call generic_make_request some times for my RAMDISK and sometimes for my Target device [SCSI DISK or local HDD partition] OR uses list.h > > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited] 2006-01-10 10:49 ` Nauman Tahir @ 2006-01-10 10:53 ` Nauman Tahir 2006-01-10 21:14 ` Arjan van de Ven 1 sibling, 0 replies; 7+ messages in thread From: Nauman Tahir @ 2006-01-10 10:53 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel, kernelnewbies [-- Attachment #1: Type: text/plain, Size: 1455 bytes --] On 1/10/06, Nauman Tahir <nauman.tahir@gmail.com> wrote: > On 1/9/06, Arjan van de Ven <arjan@infradead.org> wrote: > > On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote: > > > Hello All > > > I have posted this problem before. Now mailing again after testing as > > > recommeded in previous replys. > > > My configuration is: > > > > > > Hardware: > > > HP Proliant DL145 (2 x AMD Optaron 144) > > > 14 GB RAM > > > > > > OS: > > > FC 4 > > > > > > Kernel > > > 2.6.xx > > > > You *STILL* have not posted the URL to your source code. > > How is anyone supposed to help you without that????? > > I have attached a file which I use as thread API. Complete code is > quiet large and also need proper description. which i would be posting > if needed. > I hope I make my problem clear: I repeat : same code is giving alot of > performance degradation on previously mentioned configuration. One > suspect is the thread library. > > > dts_thread_t *dts_register_thread(void (*run) (void *), const char > *name, void * private) > > is the function to register my thread handler > > void dts_wakeup_thread(dts_thread_t *thread) > > is the function in the dts_thread.c which i use to run my thread. > > all my thread handlers either > call generic_make_request some times for my RAMDISK and sometimes for > my Target device [SCSI DISK or local HDD partition] > OR > uses list.h > > > > > > > > > > > > > > [-- Attachment #2: dts_thread.c --] [-- Type: text/plain, Size: 2475 bytes --] #include <linux/module.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/smp_lock.h> #include <linux/mempool.h> #include <linux/slab.h> #include "../include/dts_thread.h" #define THREAD_WAKEUP 0x01 extern void dts_set_bit(char * , int ); int dts_thread(void * arg) { dts_thread_t *thread = arg; lock_kernel(); /* * Detach thread */ daemonize(thread->name); current->exit_signal = SIGCHLD; allow_signal(SIGKILL); thread->tsk = current; unlock_kernel(); complete(thread->event); while (thread->run) { void (*run)(void *); wait_event_interruptible(thread->wqueue, test_bit(THREAD_WAKEUP, &thread->flags)); if (current->flags & PF_FREEZE) refrigerator(PF_FREEZE); clear_bit(THREAD_WAKEUP, &thread->flags); run = thread->run; if (run) run(thread->private); if (signal_pending(current)) flush_signals(current); } complete(thread->event); return 0; } void dts_wakeup_thread(dts_thread_t *thread) { if (thread) { dts_set_bit((char *)&thread->flags, THREAD_WAKEUP); wake_up(&thread->wqueue); } else printk("dts_wakeup_thread:.........thread is NULL\n"); } dts_thread_t *dts_register_thread(void (*run) (void *), const char *name, void * private) { dts_thread_t *thread=NULL; int ret; struct completion event; thread = (dts_thread_t *) kmalloc (sizeof(dts_thread_t), GFP_KERNEL); if (!thread) return NULL; memset(thread, 0, sizeof(dts_thread_t)); init_waitqueue_head(&thread->wqueue); init_completion(&event); thread->event = &event; thread->run = run; thread->name = name; thread->private = private; ret = kernel_thread(dts_thread, thread, 0); if (ret < 0) { printk("\ndts_register_thread:.......unable to register kernel thread\n"); kfree(thread); return NULL; } wait_for_completion(&event); // printk("Thread Allocated Successfully\n "); return thread; } void dts_interrupt_thread(dts_thread_t *thread) { if (!thread->tsk) { BUG(); return; } // dprintk("interrupting dts-thread pid %d\n", thread->tsk->pid); send_sig(SIGKILL, thread->tsk, 1); } void dts_unregister_thread(dts_thread_t *thread) { struct completion event; init_completion(&event); thread->event = &event; thread->run = NULL; thread->name = NULL; dts_interrupt_thread(thread); wait_for_completion(&event); kfree(thread); } EXPORT_SYMBOL(dts_wakeup_thread); EXPORT_SYMBOL(dts_unregister_thread); EXPORT_SYMBOL(dts_register_thread); EXPORT_SYMBOL(dts_interrupt_thread); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited] 2006-01-10 10:49 ` Nauman Tahir 2006-01-10 10:53 ` Nauman Tahir @ 2006-01-10 21:14 ` Arjan van de Ven 1 sibling, 0 replies; 7+ messages in thread From: Arjan van de Ven @ 2006-01-10 21:14 UTC (permalink / raw) To: Nauman Tahir; +Cc: linux-kernel, kernelnewbies On Tue, 2006-01-10 at 02:49 -0800, Nauman Tahir wrote: > On 1/9/06, Arjan van de Ven <arjan@infradead.org> wrote: > > On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote: > > > Hello All > > > I have posted this problem before. Now mailing again after testing as > > > recommeded in previous replys. > > > My configuration is: > > > > > > Hardware: > > > HP Proliant DL145 (2 x AMD Optaron 144) > > > 14 GB RAM > > > > > > OS: > > > FC 4 > > > > > > Kernel > > > 2.6.xx > > > > You *STILL* have not posted the URL to your source code. > > How is anyone supposed to help you without that????? > > I have attached a file which I use as thread API. Complete code is > quiet large and also need proper description. which i would be posting > if needed. well you don't give any of the block layer code, I'd say more code is needed. Just put all of it online somewhere and post the URL... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited] 2006-01-09 6:29 X86_64 and X86_32 bit performance difference [Revisited] Nauman Tahir 2006-01-09 7:51 ` Arjan van de Ven @ 2006-01-09 18:27 ` Andi Kleen 2006-01-10 10:50 ` Nauman Tahir 1 sibling, 1 reply; 7+ messages in thread From: Andi Kleen @ 2006-01-09 18:27 UTC (permalink / raw) To: Nauman Tahir; +Cc: kernelnewbies, linux-kernel Nauman Tahir <nauman.tahir@gmail.com> writes: > I have posted this problem before. Now mailing again after testing as > recommeded in previous replys. > My configuration is: Most likely it's related to you misusing the PCI DMA API in some way. Review Documentation/DMA-mapping.txt closely. If that doesn't turn on the light try oprofile. -Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: X86_64 and X86_32 bit performance difference [Revisited] 2006-01-09 18:27 ` Andi Kleen @ 2006-01-10 10:50 ` Nauman Tahir 0 siblings, 0 replies; 7+ messages in thread From: Nauman Tahir @ 2006-01-10 10:50 UTC (permalink / raw) To: Andi Kleen; +Cc: kernelnewbies, linux-kernel On 09 Jan 2006 19:27:20 +0100, Andi Kleen <ak@suse.de> wrote: > Nauman Tahir <nauman.tahir@gmail.com> writes: > > > I have posted this problem before. Now mailing again after testing as > > recommeded in previous replys. > > My configuration is: > > Most likely it's related to you misusing the PCI DMA API in some way. > Review Documentation/DMA-mapping.txt closely. > > If that doesn't turn on the light try oprofile. what is oprofile??? > > -Andi > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-01-10 21:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-09 6:29 X86_64 and X86_32 bit performance difference [Revisited] Nauman Tahir 2006-01-09 7:51 ` Arjan van de Ven 2006-01-10 10:49 ` Nauman Tahir 2006-01-10 10:53 ` Nauman Tahir 2006-01-10 21:14 ` Arjan van de Ven 2006-01-09 18:27 ` Andi Kleen 2006-01-10 10:50 ` Nauman Tahir
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox