public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* bug (trouble?) report on high mem support
@ 2002-03-15 19:25 John Helms
  2002-03-15 20:05 ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: John Helms @ 2002-03-15 19:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Trice, Jim

Hi,

My name is John Helms and I am trying to
convert our systems over to linux from 
HP-UX.  However, one of our critical
programs is giving problems because it
runs so slowly as to be useless when 
running under the 2.4.7-10 enterprise 
kernel.  This same program runs fine
under the 2.4.7-10 smp kernel.  The main
difference is that in a top output, most
of the cpu time is in system mode and very
little user mode under the enterprise kernel,
and just the opposite under the smp kernel.

Any help would be greatly appreciated.


Thanks,
John Helms 
Admin.
DuPont Photomasks, Inc.
512-310-6185





1.  Program runs slowly in kernel mode in high memory kernel

2.  A program we use runs almost entirely in kernel 
mode in a kernel compiled for large (>4GB) memory support.
Same program runs in user mode in a kernel only compiled
for smp support (4GB memory limit).  Top output shows only
~5% cpu for user, ~95% for system and program runs VERY slow.
SMP kernel has ~60% user, ~40% system and program runs
acceptably.

3.  kernel, memory

4.  Linux version 2.4.7-10enterprise 
(bhcompile@stripples.devel.redhat.com)

5.  No Oops 

6.  No example script available.

7. Environment:

rrux01 28: more /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 10
model name      : Pentium III (Cascades)
stepping        : 4
cpu MHz         : 899.324
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 mmx fxsrsse
bogomips        : 1795.68

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 10
model name      : Pentium III (Cascades)
stepping        : 4
cpu MHz         : 899.324
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 mmx fxsr
sse
bogomips        : 1795.68
 
processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 10
model name      : Pentium III (Cascades)
stepping        : 4
cpu MHz         : 899.324
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 mmx fxsr
sse
bogomips        : 1795.68
 
processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 10
model name      : Pentium III (Cascades)
stepping        : 4
cpu MHz         : 899.324
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 mmx fxsr
sse
bogomips        : 1795.68        


rrux01 30: more /proc/modules
iptable_mangle          2272   0 (autoclean) (unused)
iptable_nat            19280   0 (autoclean) (unused)
ip_conntrack           18544   1 (autoclean) [iptable_nat]
iptable_filter          2272   0 (autoclean) (unused)
ip_tables              11936   5 [iptable_mangle iptable_nat 
iptable_filter]
sg                     29552   0 (autoclean)
reiserfs              161360   1 (autoclean)
nfs                    83680   3 (autoclean)
lockd                  53744   1 (autoclean) [nfs]
sunrpc                 70000   1 (autoclean) [nfs lockd]
ide-cd                 27136   0 (autoclean)
cdrom                  28800   0 (autoclean) [ide-cd]
soundcore               4848   0 (autoclean)
autofs                 12064   2 (autoclean)
e1000                  62944   1
pcnet32                12368   0 (unused)
st                     27024   0 (unused)
usb-ohci               19360   0 (unused)
usbcore                54560   1 [usb-ohci]
ext3                   67728   8
jbd                    44480   8 [ext3]
ips                    39552  10
aic7xxx               114704   0 (unused)
sd_mod                 11584  10
scsi_mod               98512   5 [sg st ips aic7xxx sd_mod]    



rrux01 31: more /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
02f8-02ff : serial(auto)
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0700-070f : ServerWorks OSB4 IDE Controller
  0700-0707 : ide0
  0708-070f : ide1
0cf8-0cff : PCI conf1
2200-22ff : Adaptec AHA-294x / AIC-7884U
  2200-22fe : aic7xxx
2300-231f : Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
  2300-231f : PCnet/FAST III 79C975
4000-40ff : Adaptec 7899P
  4000-40fe : aic7xxx
4100-41ff : Adaptec 7899P (#2)
  4100-41fe : aic7xxx
4200-42ff : Adaptec 7892A
  4200-42fe : aic7xxx           



rrux01 29: more /proc/iomem
00000000-0009cfff : System RAM
0009d000-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000ca000-000ca7ff : Extension ROM
000ca800-000d27ff : Extension ROM
000f0000-000fffff : System ROM
00100000-dfff937f : System RAM
  00100000-0025e40f : Kernel code
  0025e410-00277d3f : Kernel data
dfff9380-dfffffff : ACPI Tables
ec2d0000-ec2dffff : PCI device 8086:1001 (Intel Corporation)
ec2e0000-ec2fffff : PCI device 8086:1001 (Intel Corporation)
  ec2e0000-ec2fffff : e1000
ed7fe000-ed7fffff : PCI device 1014:01bd (IBM)
  ed7fe000-ed7fffff : ips
efbfd000-efbfdfff : Adaptec 7892A
efbfe000-efbfefff : Adaptec 7899P (#2)
efbff000-efbfffff : Adaptec 7899P
f0000000-f7ffffff : S3 Inc. Savage 4
feb00000-feb7ffff : S3 Inc. Savage 4
febfd000-febfdfff : ServerWorks OSB4/CSB5 OHCI USB Controller
  febfd000-febfdfff : usb-ohci
febfec00-febfec1f : Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
febff000-febfffff : Adaptec AHA-294x / AIC-7884U
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
fff80000-ffffffff : reserved           




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-15 19:25 bug (trouble?) report on high mem support John Helms
@ 2002-03-15 20:05 ` Alan Cox
  2002-03-15 20:07   ` John Helms
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2002-03-15 20:05 UTC (permalink / raw)
  To: John Helms; +Cc: linux-kernel, Trice Jim

> running under the 2.4.7-10 enterprise 
> kernel.  This same program runs fine
> under the 2.4.7-10 smp kernel.  The main

Firstly queue standard comment about 2.4.9 errata kernels and upgrading

> difference is that in a top output, most
> of the cpu time is in system mode and very
> little user mode under the enterprise kernel,
> and just the opposite under the smp kernel.

When you are using large amounts of RAM things in the PC world get a bit
messy. Is this a box with a lot of memory ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-15 20:05 ` Alan Cox
@ 2002-03-15 20:07   ` John Helms
  2002-03-15 20:30     ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: John Helms @ 2002-03-15 20:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Trice Jim, Martin.Bligh

Alan,


Here is a top output.  We have 16Gb of ram.
I have also tried a 2.4.9-31 enterprise 
kernel rpm from RedHat with the same 
results.



  1:57pm  up 20 min,  2 users,  load average: 1.01, 0.88, 0.47
71 processes: 69 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU1 states:  0.1% user,  1.1% system,  0.0% nice, 98.2% idle
CPU2 states:  0.4% user, 99.1% system,  0.0% nice,  0.0% idle
CPU3 states:  0.0% user,  0.1% system,  0.0% nice, 99.4% idle
Mem:  15904836K av,  788196K used, 15116640K free,     400K shrd,   
14848K buff
Swap: 16096164K av,       0K used, 16096164K free                  
574956K cached
 
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1410 helmsjw   15   0  105M 105M  1072 R    99.9  0.6   7:37 WRITEFILE
 1411 root      11   0   992  992   780 R     1.7  0.0   0:09 top
    1 root       8   0   520  520   452 S     0.0  0.0   0:04 init
    2 root       8   0     0    0     0 SW    0.0  0.0   0:00 keventd
    3 root      19  19     0    0     0 SWN   0.0  0.0   0:00 
ksoftirqd_CPU0
    4 root      19  19     0    0     0 SWN   0.0  0.0   0:00 
ksoftirqd_CPU1
    5 root      19  19     0    0     0 SWN   0.0  0.0   0:00 
ksoftirqd_CPU2




Below is the readprofile output per a recommendation
of Martin Bligh, who I have cc'd:


[root@rrux02 sbin]# ./readprofile
     4 _stext                                     0.0625
464584 cpu_idle                                 4148.0714
     1 machine_real_restart                       0.0052
     2 __switch_to                                0.0078
    13 restore_all                                0.5652
     1 v86_signal_return                          0.0625
     1 error_code                                 0.0156
     1 device_not_available_emulate               0.0625
     1 show_trace                                 0.0057
     6 disable_irq                                0.0469
     1 sys_modify_ldt                             0.0104
     6 do_cyrix_devid                             0.0417
     2 do_read                                    0.0042
     3 do_poll                                    0.0312
    39 do_ioctl                                   0.0938
     2 do_release                                 0.0083
     1 __wake_up_sync                             0.0039
    67 sys_sched_rr_get_interval                  0.2792
     8 show_task                                  0.0200
     1 add_wait_queue                             0.0208
    21 copy_files                                 0.0298
    24 do_fork                                    0.0133
     4 printk                                     0.0119
     4 register_console                           0.0100
     1 inter_module_unregister                    0.0057
     2 sys_init_module                            0.0013
     2 sys_delete_module                          0.0033
     1 sys_query_module                           0.0026
     2 sys_setitimer                              0.0089
     1 sys_sysinfo                                0.0033
     1 find_resource                              0.0052
     1 do_sysctl                                  0.0063
     3 do_proc_dointvec                           0.0036
    15 sysctl_string                              0.0493
     2 sysctl_jiffies                             0.0083
    16 check_free_space                           0.0357
     2 sys_acct                                   0.0043
     1 do_sigpending                              0.0069
     1 sys_rt_sigtimedwait                        0.0013
     1 sys_setgid                                 0.0057
     1 in_group_p                                 0.0208
     1 sys_newuname                               0.0078
     2 __pmd_alloc                                0.0625
     2 pte_alloc                                  0.0063
     6 unlock_vma_mappings                        0.1250
     8 sys_brk                                    0.0357
 54729 do_mmap_pgoff                             51.8267
     4 get_unmapped_area                          0.0132
     1 exit_mmap                                  0.0035
     1 __insert_vm_struct                         0.0026
     3 insert_vm_struct                           0.0375
    41 merge_anon_vmas                            0.1971
     4 attempt_merge_next                         0.0312
     1 add_page_to_hash_queue                     0.0208
    43 truncate_inode_pages                       0.2240
     2 __find_page_simple                         0.0250
    11 writeout_one_page                          0.1375
    11 waitfor_one_page                           0.1375
    12 do_buffer_fdatasync                        0.0577
    23 generic_buffer_fdatasync                   0.1307
     1 filemap_fdatasync                          0.0035
     2 filemap_fdatawait                          0.0125
     2 add_to_page_cache_locked                   0.0104
     7 add_to_page_cache                          0.0337
     2 add_to_page_cache_unique                   0.0083
     1 read_cluster_nonblocking                   0.0030
     3 ___wait_on_page                            0.0156
     9 __lock_page                                0.0469
     1 lock_page                                  0.0208
     1 __find_get_page                            0.0057
     1 __find_get_swapcache_page                  0.0045
     9 __find_lock_page                           0.0268
     1 drop_behind                                0.0045
    33 generic_file_readahead                     0.0469
    32 do_generic_file_read                       0.0215
     1 file_read_actor                            0.0045
     7 file_send_actor                            0.0273
    11 sys_sendfile                               0.0215
     2 nopage_sequential_readahead                0.0066
     7 madvise_willneed                           0.0109
     1 madvise_vma                                0.0104
     1 sys_madvise                                0.0039
    34 mincore_page                               0.1635
    12 change_protection                          0.0259
     1 mprotect_fixup                             0.0012
     4 mlock_fixup                                0.0049
     2 do_mlock                                   0.0089
     5 move_page_tables                           0.0312
     1 refill_inactive_scan                       0.0033
     1 do_try_to_free_pages                       0.0104
    14 kswapd                                     0.0515
     5 wakeup_kswapd                              0.1042
     3 try_to_free_pages                          0.0469
     6 kreclaimd                                  0.0341
    18 rw_swap_page_base                          0.0450
     1 nr_free_highpages                          0.0208
     3 zone_inactive_shortage                     0.0469
     2 show_free_areas_core                       0.0069
    90 badness                                    0.4688
     2 select_bad_process                         0.0208
    30 shmem_recalc_inode                         0.3125
    15 shmem_swp_entry                            0.0938
    64 shmem_truncate                             0.1176
     8 shmem_unuse_inode                          0.0312
     2 shmem_writepage                            0.0074
    28 shmem_getpage_locked                       0.0292
     1 shmem_getpage                              0.0045
     2 shmem_file_write                           0.0022
     1 shmem_file_setup                           0.0033
     2 alloc_bounce_page                          0.0096
  1240 vfs_statfs                                 9.6875
     1 file_move                                  0.0156
     5 file_moveto                                0.0781
     1 fs_may_remount_ro                          0.0089
     1 end_buffer_io_sync                         0.0125
     4 write_unlocked_buffers                     0.0139
     1 wait_for_locked_buffers                    0.0057
    21 sync_buffers                               0.3281
     1 fsync_super                                0.0057
     1 fsync_dev                                  0.0089
     3 __block_write_full_page                    0.0054
     6 brw_kiovec                                 0.0083
    11 brw_page                                   0.0573
     1 detach_mnt                                 0.0125
     2 attach_mnt                                 0.0139
     3 move_vfsmnt                                0.0187
     2 __mntput                                   0.0208
     2 mangle                                     0.0125
     2 get_filesystem_info                        0.0019
     1 sys_ustat                                  0.0045
     1 get_unnamed_dev                            0.0208
     1 do_umount                                  0.0027
     1 mount_is_safe                              0.0208
     2 block_llseek                               0.0125
     1 get_write_access                           0.0156
     1 deny_write_access                          0.0125
     2 lookup_hash                                0.0104
     3 open_namei                                 0.0019
     1 sys_unlink                                 0.0039
     1 vfs_follow_link                            0.0026
     3 page_follow_link                           0.0065
     3 sys_dup2                                   0.0134
     5 do_fcntl                                   0.0073
     1 send_sigio_to_task                         0.0052
     1 sys_ioctl                                  0.0020
     2 vfs_readdir                                0.0096
     1 fillonedir                                 0.0057
     1 filldir64                                  0.0031
     1 do_select                                  0.0017
     1 fcntl_setlease                             0.0016
     1 dput                                       0.0028
     1 d_invalidate                               0.0069
     1 dget_locked                                0.0208
     1 d_prune_aliases                            0.0063
     1 prune_dcache                               0.0026
     1 have_submounts                             0.0078
     2 d_delete                                   0.0114
     1 __inode_dir_notify                         0.0057
     1 read_blk                                   0.0104
     4 find_free_dqentry                          0.0078
     4 do_insert_tree                             0.0086
     3 read_dquot                                 0.0085
     1 set_dqblk                                  0.0018
     1 set_info                                   0.0037
     2 dquot_initialize                           0.0043
     1 load_elf_interp                            0.0014
     1 remove_proc_entry                          0.0037
     1 minix_partition                            0.0037
   182 ext2_free_blocks                           0.1835
     2 ext2_new_block                             0.0007
     1 ext2_dotdot                                0.0208
     2 random_read                                0.0060
     4 vt_ioctl                                   0.0005
     2 vcs_write                                  0.0017
     1 receive_chars                              0.0023
     3 set_fdc                                    0.0170
     1 _lock_fdc                                  0.0030
     1 show_floppy                                0.0017
     2 start_motor                                0.0078
     1 wait_til_done                              0.0031
     2 format_interrupt                           0.0312
     1 setup_format_params                        0.0022
     5 rw_interrupt                               0.0063
     1 ide_spin_wait_hwgroup                      0.0078
     1 set_pio_mode                               0.0104
     3 ide_add_generic_settings                   0.0059
     1 ahc_linux_alloc_device                     0.0089
     2 ahc_linux_filter_command                   0.0027
     2 ahc_linux_queue_recovery_cmd               0.0007
     1 ahc_aha394XX_setup                         0.0125
     1 revalidate_scsidisk                        0.0024
     6 pci_find_capability                        0.0312
     1 pci_find_parent_resource                   0.0078
    12 pci_set_power_state                        0.0417
     3 pci_save_state                             0.0375
     1 pci_restore_state                          0.0078
     3 pci_compare_state                          0.0375
     1 pci_generic_resume_restore                 0.0312
     1 pci_generic_resume_compare                 0.0312
     2 pci_enable_device                          0.0417
     1 pci_disable_device                         0.0156
     4 pci_scan_slot                              0.0192
     1 pci_pm_suspend_device                      0.0208
     1 pci_pm_resume_device                       0.0312
     1 pci_pm_save_state_bus                      0.0104
     1 pci_pm_suspend_bus                         0.0104
    15 isapnp_alternative_switch                  0.0154
     1 isapnp_valid_port                          0.0031
     1 isapnp_check_interrupt                     0.0039
     2 isapnp_valid_irq                           0.0040
     1 isapnp_print_configuration                 0.0015
     1 isapnp_set_port                            0.0035
    12 fbcon_clear                                0.0300
     4 fbcon_putc                                 0.0192
     5 fbcon_cfb24_putcs                          0.0054
     3 write_disk_sb                              0.0067
     3 sync_sbs                                   0.0208
     1 set_disk_info                              0.0312
     9 get_geo                                    0.0703
     1 md_ioctl                                   0.0004
     3 md_thread                                  0.0069
     1 md_wakeup_thread                           0.0312
     5 md_register_thread                         0.0312
     2 md_error                                   0.0125
     1 status_unused                              0.0063
     4 status_resync                              0.0068
    10 md_status_read_proc                        0.0145
     2 sys_sendmsg                                0.0042
     1 dst_alloc                                  0.0069
     1 ip_queue_xmit                              0.0008
     1 ip_build_xmit_slow                         0.0008
     2 ip_build_xmit                              0.0022
     2 ip_fragment                                0.0023
     2 ip_icmp_error                              0.0060
     1 tcp_fastretrans_alert                      0.0008
     1 __tcp_select_window                        0.0042
   230 rtmsg_ifa                                  1.4375
    31 inet_forward_change                        0.2422
   669 devinet_sysctl_forward                     6.9688
     3 devinet_sysctl_register                    0.0104
     4 devinet_sysctl_unregister                  0.0833
     4 inet_sock_destruct                         0.0104
     3 inet_sock_release                          0.0170
     4 inet_create                                0.0066
    10 inet_bind                                  0.0152
     1 inet_wait_for_connect                      0.0022
     2 inet_stream_connect                        0.0032
   374 fib_netdev_event                           2.9219
523311 total                                      0.4193       
              

>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 3/15/02, 2:05:39 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote regarding 
Re: bug (trouble?) report on high mem support:


> > running under the 2.4.7-10 enterprise
> > kernel.  This same program runs fine
> > under the 2.4.7-10 smp kernel.  The main

> Firstly queue standard comment about 2.4.9 errata kernels and upgrading

> > difference is that in a top output, most
> > of the cpu time is in system mode and very
> > little user mode under the enterprise kernel,
> > and just the opposite under the smp kernel.

> When you are using large amounts of RAM things in the PC world get a bit
> messy. Is this a box with a lot of memory ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-15 20:07   ` John Helms
@ 2002-03-15 20:30     ` Alan Cox
  2002-03-15 20:32       ` John Helms
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2002-03-15 20:30 UTC (permalink / raw)
  To: John Helms; +Cc: Alan Cox, linux-kernel, Trice Jim, Martin.Bligh

> Here is a top output.  We have 16Gb of ram.
> I have also tried a 2.4.9-31 enterprise=20
> kernel rpm from RedHat with the same=20
> results.

Ok that would make sense. Next question is do you have an I/O controller
that can use all the 64bit address space on the PCI bus ?

What is happening is that you are using a lot of CPU copying buffers down
into lower memory to transfer to/from disk - as well probably as that
causing a lot of competition for low memory. If your I/O controller can hit
the full 64bit space there are some rather nice test patches that should
completely obliterate the problem.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-15 20:30     ` Alan Cox
@ 2002-03-15 20:32       ` John Helms
  2002-03-15 23:37         ` Mike Anderson
  2002-03-15 23:38         ` Randy.Dunlap
  0 siblings, 2 replies; 11+ messages in thread
From: John Helms @ 2002-03-15 20:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Trice Jim, Martin.Bligh

Alan,

Ok, how do I go about determining that?  The machine
I have is a brand-spankin' new IBM x-series 350 with
4 900MHz Xeon processors.  The system bios can 
recognize all of the 16320MB of memory at startup.
If those patches work, it will save our butts as
we have a major conversion project that hinges on
this.  

Thanks,
jwh

>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 3/15/02, 2:30:22 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote regarding 
Re: bug (trouble?) report on high mem support:


> > Here is a top output.  We have 16Gb of ram.
> > I have also tried a 2.4.9-31 enterprise=20
> > kernel rpm from RedHat with the same=20
> > results.

> Ok that would make sense. Next question is do you have an I/O controller
> that can use all the 64bit address space on the PCI bus ?

> What is happening is that you are using a lot of CPU copying buffers down
> into lower memory to transfer to/from disk - as well probably as that
> causing a lot of competition for low memory. If your I/O controller can 
hit
> the full 64bit space there are some rather nice test patches that should
> completely obliterate the problem.

> Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-15 20:32       ` John Helms
@ 2002-03-15 23:37         ` Mike Anderson
  2002-03-15 23:38         ` Randy.Dunlap
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Anderson @ 2002-03-15 23:37 UTC (permalink / raw)
  To: John Helms; +Cc: Alan Cox, linux-kernel, Trice Jim, Martin.Bligh

John,
	What kind of io controllers are on the system? 
	
	To use CONFIG_HIGHIO you need a IO controller the is physically
	capable of addressing higher memory and an adapter driver that has
	been converted to support the CONFIG_HIGHIO interface.

-Mike
John Helms [john.helms@photomask.com] wrote:
> Alan,
> 
> Ok, how do I go about determining that?  The machine
> I have is a brand-spankin' new IBM x-series 350 with
> 4 900MHz Xeon processors.  The system bios can 
> recognize all of the 16320MB of memory at startup.
> If those patches work, it will save our butts as
> we have a major conversion project that hinges on
> this.  
> 
> Thanks,
> jwh
> 
> >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
> 
> On 3/15/02, 2:30:22 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote regarding 
> Re: bug (trouble?) report on high mem support:
> 
> 
> > > Here is a top output.  We have 16Gb of ram.
> > > I have also tried a 2.4.9-31 enterprise=20
> > > kernel rpm from RedHat with the same=20
> > > results.
> 
> > Ok that would make sense. Next question is do you have an I/O controller
> > that can use all the 64bit address space on the PCI bus ?
> 
> > What is happening is that you are using a lot of CPU copying buffers down
> > into lower memory to transfer to/from disk - as well probably as that
> > causing a lot of competition for low memory. If your I/O controller can 
> hit
> > the full 64bit space there are some rather nice test patches that should
> > completely obliterate the problem.
> 
> > Alan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-15 20:32       ` John Helms
  2002-03-15 23:37         ` Mike Anderson
@ 2002-03-15 23:38         ` Randy.Dunlap
  2002-03-16  0:02           ` Martin J. Bligh
  1 sibling, 1 reply; 11+ messages in thread
From: Randy.Dunlap @ 2002-03-15 23:38 UTC (permalink / raw)
  To: John Helms; +Cc: Alan Cox, linux-kernel, Trice Jim, Martin.Bligh

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2327 bytes --]

Hi-

If someone (Martin or Alan ?) hasn't already told you,
there is a block-highmem patch for 2.4.teens, so if you
can upgrade your kernel to 2.4.19-pre3, for example,
the block-highmem patch is at
  http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre3aa2/
file: 00_block-highmem-all-18b-7.gz

Also, as suggested a day or two ago, you could profile the
kernel to see where it is spending time, although I'm not
sure how useful that would be.

A third alternative for you is to apply the attached patch.
I applied it to 2.4.9 (it applies with a little "fuzz"),
but I haven't tested it on 2.4.9, just 2.4.teens.

It counts bounce IOs, both normal IOs and swap IOs.
They can be displayed by printing /proc/stats .
This patch doesn't work with the block-highmem
patch applied -- I'm working on a different patch for that.

This patch also prints (by major:minor) which device(s) are
causing bounce IO.  This printing could become excessive
for you, so don't hesitate to disable it (comment it out, or
let me know if you need help with it).

Regards,
~Randy


On Fri, 15 Mar 2002, John Helms wrote:

| Alan,
|
| Ok, how do I go about determining that?  The machine
| I have is a brand-spankin' new IBM x-series 350 with
| 4 900MHz Xeon processors.  The system bios can
| recognize all of the 16320MB of memory at startup.
| If those patches work, it will save our butts as
| we have a major conversion project that hinges on
| this.
|
| Thanks,
| jwh
|
| >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
|
| On 3/15/02, 2:30:22 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote regarding
| Re: bug (trouble?) report on high mem support:
|
|
| > > Here is a top output.  We have 16Gb of ram.
| > > I have also tried a 2.4.9-31 enterprise=20
| > > kernel rpm from RedHat with the same=20
| > > results.
|
| > Ok that would make sense. Next question is do you have an I/O controller
| > that can use all the 64bit address space on the PCI bus ?
|
| > What is happening is that you are using a lot of CPU copying buffers down
| > into lower memory to transfer to/from disk - as well probably as that
| > causing a lot of competition for low memory. If your I/O controller can
| hit
| > the full 64bit space there are some rather nice test patches that should
| > completely obliterate the problem.
|
| > Alan

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2629 bytes --]

--- linux/include/linux/kernel_stat.h.org	Mon Nov 26 10:19:29 2001
+++ linux/include/linux/kernel_stat.h	Thu Dec 20 13:26:50 2001
@@ -26,12 +26,14 @@
 	unsigned int dk_drive_wblk[DK_MAX_MAJOR][DK_MAX_DISK];
 	unsigned int pgpgin, pgpgout;
 	unsigned int pswpin, pswpout;
+	unsigned int bouncein, bounceout;
+	unsigned int bounceswapin, bounceswapout;
 #if !defined(CONFIG_ARCH_S390)
 	unsigned int irqs[NR_CPUS][NR_IRQS];
 #endif
-	unsigned int ipackets, opackets;
-	unsigned int ierrors, oerrors;
-	unsigned int collisions;
+///	unsigned int ipackets, opackets;
+///	unsigned int ierrors, oerrors;
+///	unsigned int collisions;
 	unsigned int context_swtch;
 };
 
--- linux/fs/proc/proc_misc.c.org	Tue Nov 20 21:29:09 2001
+++ linux/fs/proc/proc_misc.c	Thu Dec 20 13:34:44 2001
@@ -310,6 +310,12 @@
 		xtime.tv_sec - jif / HZ,
 		total_forks);
 
+	len += sprintf(page + len,
+		"bounce io %u %u\n"
+		"bounce swap io %u %u\n",
+		kstat.bouncein, kstat.bounceout,
+		kstat.bounceswapin, kstat.bounceswapout);
+
 	return proc_calc_metrics(page, start, off, count, eof, len);
 }
 
--- linux/mm/page_io.c.org	Mon Nov 19 15:19:42 2001
+++ linux/mm/page_io.c	Thu Dec 20 15:59:41 2001
@@ -10,6 +10,7 @@
  *  Always use brw_page, life becomes simpler. 12 May 1998 Eric Biederman
  */
 
+#include <linux/config.h>
 #include <linux/mm.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -68,6 +69,13 @@
 		dev = swapf->i_dev;
 	} else {
 		return 0;
+	}
+
+	if (PageHighMem(page)) {
+		if (rw == WRITE)
+			kstat.bounceswapout++;
+		else
+			kstat.bounceswapin++;
 	}
 
  	/* block_size == PAGE_SIZE/zones_used */
--- linux/drivers/block/ll_rw_blk.c.org	Mon Oct 29 12:11:17 2001
+++ linux/drivers/block/ll_rw_blk.c	Thu Dec 20 17:45:19 2001
@@ -936,6 +936,7 @@
 	} while (q->make_request_fn(q, rw, bh));
 }
 
+static int bmsg_count = 0;
 
 /**
  * submit_bh: submit a buffer_head to the block device later for I/O
@@ -953,6 +954,7 @@
 void submit_bh(int rw, struct buffer_head * bh)
 {
 	int count = bh->b_size >> 9;
+	int bounce = PageHighMem(bh->b_page);
 
 	if (!test_bit(BH_Lock, &bh->b_state))
 		BUG();
@@ -971,10 +973,19 @@
 	switch (rw) {
 		case WRITE:
 			kstat.pgpgout += count;
+			if (bounce) kstat.bounceout += count;
 			break;
 		default:
 			kstat.pgpgin += count;
+			if (bounce) kstat.bouncein += count;
 			break;
+	}
+	if (bounce) {
+		bmsg_count++;
+		if ((bmsg_count % 1000) == 1)
+			printk ("bounce io (%c) for %d:%d\n",
+				(rw == WRITE) ? 'W' : 'R',
+				MAJOR(bh->b_rdev), MINOR(bh->b_rdev));
 	}
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-15 23:38         ` Randy.Dunlap
@ 2002-03-16  0:02           ` Martin J. Bligh
  2002-03-16  4:34             ` John Helms
  0 siblings, 1 reply; 11+ messages in thread
From: Martin J. Bligh @ 2002-03-16  0:02 UTC (permalink / raw)
  To: Randy.Dunlap, John Helms; +Cc: Alan Cox, linux-kernel, Trice Jim

>From how I read his original description:

> 2.  A program we use runs almost entirely in kernel 
> mode in a kernel compiled for large (>4GB) memory support.
> Same program runs in user mode in a kernel only compiled
> for smp support (4GB memory limit).  Top output shows only
> ~5% cpu for user, ~95% for system and program runs VERY slow.
> SMP kernel has ~60% user, ~40% system and program runs
> acceptably.

I assumed the problem occured when he switched from 4Gb support
to 64Gb support ... am I just misreading this? So he should already
be bouncing everything with 4Gb (which seems to work) around 
unless he has the high io stuff.

The only thing that looked wierd in his profile was this:

54729 do_mmap_pgoff                             51.8267

John, can you try "echo 2 > /proc/profile" just before you run your
test, and then readprofile immediately your test stops? That'll zero
the profile just before you start, and should make the output a little
more "focused", and confirm that this function is what's eating the
sys time.

M.

--On Friday, March 15, 2002 15:38:11 -0800 "Randy.Dunlap" <rddunlap@osdl.org> wrote:

> Hi-
> 
> If someone (Martin or Alan ?) hasn't already told you,
> there is a block-highmem patch for 2.4.teens, so if you
> can upgrade your kernel to 2.4.19-pre3, for example,
> the block-highmem patch is at
>   http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre3aa2/
> file: 00_block-highmem-all-18b-7.gz
> 
> Also, as suggested a day or two ago, you could profile the
> kernel to see where it is spending time, although I'm not
> sure how useful that would be.
> 
> A third alternative for you is to apply the attached patch.
> I applied it to 2.4.9 (it applies with a little "fuzz"),
> but I haven't tested it on 2.4.9, just 2.4.teens.
> 
> It counts bounce IOs, both normal IOs and swap IOs.
> They can be displayed by printing /proc/stats .
> This patch doesn't work with the block-highmem
> patch applied -- I'm working on a different patch for that.
> 
> This patch also prints (by major:minor) which device(s) are
> causing bounce IO.  This printing could become excessive
> for you, so don't hesitate to disable it (comment it out, or
> let me know if you need help with it).
> 
> Regards,
> ~Randy
> 
> 
> On Fri, 15 Mar 2002, John Helms wrote:
> 
>| Alan,
>| 
>| Ok, how do I go about determining that?  The machine
>| I have is a brand-spankin' new IBM x-series 350 with
>| 4 900MHz Xeon processors.  The system bios can
>| recognize all of the 16320MB of memory at startup.
>| If those patches work, it will save our butts as
>| we have a major conversion project that hinges on
>| this.
>| 
>| Thanks,
>| jwh
>| 
>| >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
>| 
>| On 3/15/02, 2:30:22 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote regarding
>| Re: bug (trouble?) report on high mem support:
>| 
>| 
>| > > Here is a top output.  We have 16Gb of ram.
>| > > I have also tried a 2.4.9-31 enterprise=20
>| > > kernel rpm from RedHat with the same=20
>| > > results.
>| 
>| > Ok that would make sense. Next question is do you have an I/O controller
>| > that can use all the 64bit address space on the PCI bus ?
>| 
>| > What is happening is that you are using a lot of CPU copying buffers down
>| > into lower memory to transfer to/from disk - as well probably as that
>| > causing a lot of competition for low memory. If your I/O controller can
>| hit
>| > the full 64bit space there are some rather nice test patches that should
>| > completely obliterate the problem.
>| 
>| > Alan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-16  0:02           ` Martin J. Bligh
@ 2002-03-16  4:34             ` John Helms
  2002-03-16  5:44               ` Martin J. Bligh
  2002-03-18 21:45               ` Randy.Dunlap
  0 siblings, 2 replies; 11+ messages in thread
From: John Helms @ 2002-03-16  4:34 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Randy.Dunlap, Alan Cox, linux-kernel, Trice Jim, Andmike

Martin/Randy/Alan/Mike,

The readprofile output I sent earlier is pretty
accurate.  I performed the test right after a reboot
to the enterprise (64GB mem) kernel with a profile=2
boot option.  I then ran our program, which reads in
a 3.1GB file from an NFS mount, and outputs a 2.4GB file
in another format to the same NFS mount.  Networking
is achieved through an IBM Gigabit fiber card with 
Intel e1000 chipset, which we have downloaded the
latest source just to get it to work.  But network
throughput looks great.  Other programs using the 
NFS mounts work fine, so I'm pretty sure it's not
a network issue.

The smp kernel (no 64GB mem support) completed the
file conversion in 3.5 hours.  Previous attempts 
with the enterprise kernel (64GB mem support) had
to be aborted after 3 days and only started to write
the converted file to disk by then.  This application
does not run multi-threaded, but we will have 
multiple users running the program on separate
file conversions simultaneously.  Hence the need
for lots of memory.

I guess the main question at this point is whether
our hardware supports high memory, and then which 
patches or kernel upgrades can correct our problem.
If we upgrade the entire kernel, which release 
would you recommend for a stable production machine
with >4GB memory?  If there are swap improvements,
we also need whatever we can get in that area.


I don't know if this helps, but here is some info
from the /proc filesystem:


rrux01 23: more ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
02f8-02ff : serial(auto)
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0700-070f : ServerWorks OSB4 IDE Controller
  0700-0707 : ide0
  0708-070f : ide1
0cf8-0cff : PCI conf1
2200-22ff : Adaptec AHA-294x / AIC-7884U
  2200-22fe : aic7xxx
2300-231f : Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
  2300-231f : PCnet/FAST III 79C975
4000-40ff : Adaptec 7899P
  4000-40fe : aic7xxx
4100-41ff : Adaptec 7899P (#2)
  4100-41fe : aic7xxx
4200-42ff : Adaptec 7892A
  4200-42fe : aic7xxx
rrux01 24: more iomem
00000000-0009cfff : System RAM
0009d000-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000ca000-000ca7ff : Extension ROM
000ca800-000d27ff : Extension ROM
000f0000-000fffff : System ROM
00100000-dfff937f : System RAM
  00100000-0025e40f : Kernel code
  0025e410-00277d3f : Kernel data
dfff9380-dfffffff : ACPI Tables
ec2d0000-ec2dffff : PCI device 8086:1001 (Intel Corporation)
ec2e0000-ec2fffff : PCI device 8086:1001 (Intel Corporation)
  ec2e0000-ec2fffff : e1000
ed7fe000-ed7fffff : PCI device 1014:01bd (IBM)
  ed7fe000-ed7fffff : ips
efbfd000-efbfdfff : Adaptec 7892A
efbfe000-efbfefff : Adaptec 7899P (#2)
efbff000-efbfffff : Adaptec 7899P
f0000000-f7ffffff : S3 Inc. Savage 4
feb00000-feb7ffff : S3 Inc. Savage 4
febfd000-febfdfff : ServerWorks OSB4/CSB5 OHCI USB Controller
  febfd000-febfdfff : usb-ohci
febfec00-febfec1f : Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
febff000-febfffff : Adaptec AHA-294x / AIC-7884U
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
fff80000-ffffffff : reserved
rrux01 25: ls -ld modules
-r--r--r--    1 root     root            0 Mar 15 20:52 modules
rrux01 26: more modules
iptable_mangle          2272   0 (autoclean) (unused)
iptable_nat            19280   0 (autoclean) (unused)
ip_conntrack           18544   1 (autoclean) [iptable_nat]
iptable_filter          2272   0 (autoclean) (unused)
ip_tables              11936   5 [iptable_mangle iptable_nat 
iptable_filter]
sg                     29552   0 (autoclean)
reiserfs              161360   1 (autoclean)
nfs                    83680   3 (autoclean)
lockd                  53744   1 (autoclean) [nfs]
sunrpc                 70000   1 (autoclean) [nfs lockd]
ide-cd                 27136   0 (autoclean)
cdrom                  28800   0 (autoclean) [ide-cd]
soundcore               4848   0 (autoclean)
autofs                 12064   2 (autoclean)
e1000                  62944   1
pcnet32                12368   0 (unused)
st                     27024   0 (unused)
usb-ohci               19360   0 (unused)
usbcore                54560   1 [usb-ohci]
ext3                   67728   8
jbd                    44480   8 [ext3]
ips                    39552  10
aic7xxx               114704   0 (unused)
sd_mod                 11584  10
scsi_mod               98512   5 [sg st ips aic7xxx sd_mod]   

>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 3/15/02, 6:02:28 PM, "Martin J. Bligh" <Martin.Bligh@us.ibm.com> 
wrote regarding Re: bug (trouble?) report on high mem support:


> From how I read his original description:

> > 2.  A program we use runs almost entirely in kernel
> > mode in a kernel compiled for large (>4GB) memory support.
> > Same program runs in user mode in a kernel only compiled
> > for smp support (4GB memory limit).  Top output shows only
> > ~5% cpu for user, ~95% for system and program runs VERY slow.
> > SMP kernel has ~60% user, ~40% system and program runs
> > acceptably.

> I assumed the problem occured when he switched from 4Gb support
> to 64Gb support ... am I just misreading this? So he should already
> be bouncing everything with 4Gb (which seems to work) around
> unless he has the high io stuff.

> The only thing that looked wierd in his profile was this:

> 54729 do_mmap_pgoff                             51.8267

> John, can you try "echo 2 > /proc/profile" just before you run your
> test, and then readprofile immediately your test stops? That'll zero
> the profile just before you start, and should make the output a little
> more "focused", and confirm that this function is what's eating the
> sys time.

> M.

> --On Friday, March 15, 2002 15:38:11 -0800 "Randy.Dunlap" 
<rddunlap@osdl.org> wrote:

> > Hi-
> >
> > If someone (Martin or Alan ?) hasn't already told you,
> > there is a block-highmem patch for 2.4.teens, so if you
> > can upgrade your kernel to 2.4.19-pre3, for example,
> > the block-highmem patch is at
> >   
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19p
re3aa2/
> > file: 00_block-highmem-all-18b-7.gz
> >
> > Also, as suggested a day or two ago, you could profile the
> > kernel to see where it is spending time, although I'm not
> > sure how useful that would be.
> >
> > A third alternative for you is to apply the attached patch.
> > I applied it to 2.4.9 (it applies with a little "fuzz"),
> > but I haven't tested it on 2.4.9, just 2.4.teens.
> >
> > It counts bounce IOs, both normal IOs and swap IOs.
> > They can be displayed by printing /proc/stats .
> > This patch doesn't work with the block-highmem
> > patch applied -- I'm working on a different patch for that.
> >
> > This patch also prints (by major:minor) which device(s) are
> > causing bounce IO.  This printing could become excessive
> > for you, so don't hesitate to disable it (comment it out, or
> > let me know if you need help with it).
> >
> > Regards,
> > ~Randy
> >
> >
> > On Fri, 15 Mar 2002, John Helms wrote:
> >
> >| Alan,
> >|
> >| Ok, how do I go about determining that?  The machine
> >| I have is a brand-spankin' new IBM x-series 350 with
> >| 4 900MHz Xeon processors.  The system bios can
> >| recognize all of the 16320MB of memory at startup.
> >| If those patches work, it will save our butts as
> >| we have a major conversion project that hinges on
> >| this.
> >|
> >| Thanks,
> >| jwh
> >|
> >| >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
> >|
> >| On 3/15/02, 2:30:22 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote 
regarding
> >| Re: bug (trouble?) report on high mem support:
> >|
> >|
> >| > > Here is a top output.  We have 16Gb of ram.
> >| > > I have also tried a 2.4.9-31 enterprise=20
> >| > > kernel rpm from RedHat with the same=20
> >| > > results.
> >|
> >| > Ok that would make sense. Next question is do you have an I/O 
controller
> >| > that can use all the 64bit address space on the PCI bus ?
> >|
> >| > What is happening is that you are using a lot of CPU copying 
buffers down
> >| > into lower memory to transfer to/from disk - as well probably as 
that
> >| > causing a lot of competition for low memory. If your I/O controller 
can
> >| hit
> >| > the full 64bit space there are some rather nice test patches that 
should
> >| > completely obliterate the problem.
> >|
> >| > Alan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-16  4:34             ` John Helms
@ 2002-03-16  5:44               ` Martin J. Bligh
  2002-03-18 21:45               ` Randy.Dunlap
  1 sibling, 0 replies; 11+ messages in thread
From: Martin J. Bligh @ 2002-03-16  5:44 UTC (permalink / raw)
  To: John Helms; +Cc: Randy.Dunlap, Alan Cox, linux-kernel, Trice Jim, Andmike

> The readprofile output I sent earlier is pretty
> accurate.  I performed the test right after a reboot
> to the enterprise (64GB mem) kernel with a profile=2
> boot option.  I then ran our program, which reads in
> a 3.1GB file from an NFS mount, and outputs a 2.4GB file
> in another format to the same NFS mount.  Networking
> is achieved through an IBM Gigabit fiber card with 
> Intel e1000 chipset, which we have downloaded the
> latest source just to get it to work.  But network
> throughput looks great.  Other programs using the 
> NFS mounts work fine, so I'm pretty sure it's not
> a network issue.
> 
> The smp kernel (no 64GB mem support) completed the
> file conversion in 3.5 hours.  Previous attempts 
> with the enterprise kernel (64GB mem support) had
> to be aborted after 3 days and only started to write
> the converted file to disk by then.  This application
> does not run multi-threaded, but we will have 
> multiple users running the program on separate
> file conversions simultaneously.  Hence the need
> for lots of memory.
> 
> I guess the main question at this point is whether
> our hardware supports high memory, and then which 
> patches or kernel upgrades can correct our problem.
> If we upgrade the entire kernel, which release 
> would you recommend for a stable production machine
> with >4GB memory?  If there are swap improvements,
> we also need whatever we can get in that area.

You mention "64Gb support" or "no 64Gb support" throughout 
this - have you tried a kernel with 4Gb support? That'd
give you the HIGHMEM bounce buffering still. One step at a
time ;-)

M.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bug (trouble?) report on high mem support
  2002-03-16  4:34             ` John Helms
  2002-03-16  5:44               ` Martin J. Bligh
@ 2002-03-18 21:45               ` Randy.Dunlap
  1 sibling, 0 replies; 11+ messages in thread
From: Randy.Dunlap @ 2002-03-18 21:45 UTC (permalink / raw)
  To: John Helms; +Cc: Martin J. Bligh, Alan Cox, linux-kernel, Trice Jim, Andmike

Hi John-

Have you any progress on this?

There are lots of patches out there that you could try,
given the time.

And is the application (source code) available to look at?
I'm not interested in how it massages the data, but I am
interested in how it reads, writes, calls mmap(), i.e.,
most of its system calls, so if the program without the
data manipulation part of it were available, that should
be sufficient.

Thanks,
~Randy


On Sat, 16 Mar 2002, John Helms wrote:

| Martin/Randy/Alan/Mike,
|
| The readprofile output I sent earlier is pretty
| accurate.  I performed the test right after a reboot
| to the enterprise (64GB mem) kernel with a profile=2
| boot option.  I then ran our program, which reads in
| a 3.1GB file from an NFS mount, and outputs a 2.4GB file
| in another format to the same NFS mount.  Networking
| is achieved through an IBM Gigabit fiber card with
| Intel e1000 chipset, which we have downloaded the
| latest source just to get it to work.  But network
| throughput looks great.  Other programs using the
| NFS mounts work fine, so I'm pretty sure it's not
| a network issue.
|
| The smp kernel (no 64GB mem support) completed the
| file conversion in 3.5 hours.  Previous attempts
| with the enterprise kernel (64GB mem support) had
| to be aborted after 3 days and only started to write
| the converted file to disk by then.  This application
| does not run multi-threaded, but we will have
| multiple users running the program on separate
| file conversions simultaneously.  Hence the need
| for lots of memory.
|
| I guess the main question at this point is whether
| our hardware supports high memory, and then which
| patches or kernel upgrades can correct our problem.
| If we upgrade the entire kernel, which release
| would you recommend for a stable production machine
| with >4GB memory?  If there are swap improvements,
| we also need whatever we can get in that area.
|
|
| I don't know if this helps, but here is some info
| from the /proc filesystem:
|
|
| rrux01 23: more ioports
| 0000-001f : dma1
| 0020-003f : pic1
| 0040-005f : timer
| 0060-006f : keyboard
| 0070-007f : rtc
| 0080-008f : dma page reg
| 00a0-00bf : pic2
| 00c0-00df : dma2
| 00f0-00ff : fpu
| 01f0-01f7 : ide0
| 02f8-02ff : serial(auto)
| 03c0-03df : vga+
| 03f6-03f6 : ide0
| 03f8-03ff : serial(auto)
| 0700-070f : ServerWorks OSB4 IDE Controller
|   0700-0707 : ide0
|   0708-070f : ide1
| 0cf8-0cff : PCI conf1
| 2200-22ff : Adaptec AHA-294x / AIC-7884U
|   2200-22fe : aic7xxx
| 2300-231f : Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
|   2300-231f : PCnet/FAST III 79C975
| 4000-40ff : Adaptec 7899P
|   4000-40fe : aic7xxx
| 4100-41ff : Adaptec 7899P (#2)
|   4100-41fe : aic7xxx
| 4200-42ff : Adaptec 7892A
|   4200-42fe : aic7xxx
| rrux01 24: more iomem
| 00000000-0009cfff : System RAM
| 0009d000-0009ffff : reserved
| 000a0000-000bffff : Video RAM area
| 000c0000-000c7fff : Video ROM
| 000ca000-000ca7ff : Extension ROM
| 000ca800-000d27ff : Extension ROM
| 000f0000-000fffff : System ROM
| 00100000-dfff937f : System RAM
|   00100000-0025e40f : Kernel code
|   0025e410-00277d3f : Kernel data
| dfff9380-dfffffff : ACPI Tables
| ec2d0000-ec2dffff : PCI device 8086:1001 (Intel Corporation)
| ec2e0000-ec2fffff : PCI device 8086:1001 (Intel Corporation)
|   ec2e0000-ec2fffff : e1000
| ed7fe000-ed7fffff : PCI device 1014:01bd (IBM)
|   ed7fe000-ed7fffff : ips
| efbfd000-efbfdfff : Adaptec 7892A
| efbfe000-efbfefff : Adaptec 7899P (#2)
| efbff000-efbfffff : Adaptec 7899P
| f0000000-f7ffffff : S3 Inc. Savage 4
| feb00000-feb7ffff : S3 Inc. Savage 4
| febfd000-febfdfff : ServerWorks OSB4/CSB5 OHCI USB Controller
|   febfd000-febfdfff : usb-ohci
| febfec00-febfec1f : Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
| febff000-febfffff : Adaptec AHA-294x / AIC-7884U
| fec00000-fec00fff : reserved
| fee00000-fee00fff : reserved
| fff80000-ffffffff : reserved
| rrux01 25: ls -ld modules
| -r--r--r--    1 root     root            0 Mar 15 20:52 modules
| rrux01 26: more modules
| iptable_mangle          2272   0 (autoclean) (unused)
| iptable_nat            19280   0 (autoclean) (unused)
| ip_conntrack           18544   1 (autoclean) [iptable_nat]
| iptable_filter          2272   0 (autoclean) (unused)
| ip_tables              11936   5 [iptable_mangle iptable_nat
| iptable_filter]
| sg                     29552   0 (autoclean)
| reiserfs              161360   1 (autoclean)
| nfs                    83680   3 (autoclean)
| lockd                  53744   1 (autoclean) [nfs]
| sunrpc                 70000   1 (autoclean) [nfs lockd]
| ide-cd                 27136   0 (autoclean)
| cdrom                  28800   0 (autoclean) [ide-cd]
| soundcore               4848   0 (autoclean)
| autofs                 12064   2 (autoclean)
| e1000                  62944   1
| pcnet32                12368   0 (unused)
| st                     27024   0 (unused)
| usb-ohci               19360   0 (unused)
| usbcore                54560   1 [usb-ohci]
| ext3                   67728   8
| jbd                    44480   8 [ext3]
| ips                    39552  10
| aic7xxx               114704   0 (unused)
| sd_mod                 11584  10
| scsi_mod               98512   5 [sg st ips aic7xxx sd_mod]
|
| >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
|
| On 3/15/02, 6:02:28 PM, "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
| wrote regarding Re: bug (trouble?) report on high mem support:
|
|
| > From how I read his original description:
|
| > > 2.  A program we use runs almost entirely in kernel
| > > mode in a kernel compiled for large (>4GB) memory support.
| > > Same program runs in user mode in a kernel only compiled
| > > for smp support (4GB memory limit).  Top output shows only
| > > ~5% cpu for user, ~95% for system and program runs VERY slow.
| > > SMP kernel has ~60% user, ~40% system and program runs
| > > acceptably.
|
| > I assumed the problem occured when he switched from 4Gb support
| > to 64Gb support ... am I just misreading this? So he should already
| > be bouncing everything with 4Gb (which seems to work) around
| > unless he has the high io stuff.
|
| > The only thing that looked wierd in his profile was this:
|
| > 54729 do_mmap_pgoff                             51.8267
|
| > John, can you try "echo 2 > /proc/profile" just before you run your
| > test, and then readprofile immediately your test stops? That'll zero
| > the profile just before you start, and should make the output a little
| > more "focused", and confirm that this function is what's eating the
| > sys time.
|
| > M.
|
| > --On Friday, March 15, 2002 15:38:11 -0800 "Randy.Dunlap"
| <rddunlap@osdl.org> wrote:
|
| > > Hi-
| > >
| > > If someone (Martin or Alan ?) hasn't already told you,
| > > there is a block-highmem patch for 2.4.teens, so if you
| > > can upgrade your kernel to 2.4.19-pre3, for example,
| > > the block-highmem patch is at
| > >
| http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19p
| re3aa2/
| > > file: 00_block-highmem-all-18b-7.gz
| > >
| > > Also, as suggested a day or two ago, you could profile the
| > > kernel to see where it is spending time, although I'm not
| > > sure how useful that would be.
| > >
| > > A third alternative for you is to apply the attached patch.
| > > I applied it to 2.4.9 (it applies with a little "fuzz"),
| > > but I haven't tested it on 2.4.9, just 2.4.teens.
| > >
| > > It counts bounce IOs, both normal IOs and swap IOs.
| > > They can be displayed by printing /proc/stats .
| > > This patch doesn't work with the block-highmem
| > > patch applied -- I'm working on a different patch for that.
| > >
| > > This patch also prints (by major:minor) which device(s) are
| > > causing bounce IO.  This printing could become excessive
| > > for you, so don't hesitate to disable it (comment it out, or
| > > let me know if you need help with it).
| > >
| > > Regards,
| > > ~Randy
| > >
| > >
| > > On Fri, 15 Mar 2002, John Helms wrote:
| > >
| > >| Alan,
| > >|
| > >| Ok, how do I go about determining that?  The machine
| > >| I have is a brand-spankin' new IBM x-series 350 with
| > >| 4 900MHz Xeon processors.  The system bios can
| > >| recognize all of the 16320MB of memory at startup.
| > >| If those patches work, it will save our butts as
| > >| we have a major conversion project that hinges on
| > >| this.
| > >|
| > >| Thanks,
| > >| jwh
| > >|
| > >| >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
| > >|
| > >| On 3/15/02, 2:30:22 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote
| regarding
| > >| Re: bug (trouble?) report on high mem support:
| > >|
| > >|
| > >| > > Here is a top output.  We have 16Gb of ram.
| > >| > > I have also tried a 2.4.9-31 enterprise=20
| > >| > > kernel rpm from RedHat with the same=20
| > >| > > results.
| > >|
| > >| > Ok that would make sense. Next question is do you have an I/O
| controller
| > >| > that can use all the 64bit address space on the PCI bus ?
| > >|
| > >| > What is happening is that you are using a lot of CPU copying
| buffers down
| > >| > into lower memory to transfer to/from disk - as well probably as
| that
| > >| > causing a lot of competition for low memory. If your I/O controller
| can
| > >| hit
| > >| > the full 64bit space there are some rather nice test patches that
| should
| > >| > completely obliterate the problem.
| > >|
| > >| > Alan
|
|
|

-- 
~Randy


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-03-18 21:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-03-15 19:25 bug (trouble?) report on high mem support John Helms
2002-03-15 20:05 ` Alan Cox
2002-03-15 20:07   ` John Helms
2002-03-15 20:30     ` Alan Cox
2002-03-15 20:32       ` John Helms
2002-03-15 23:37         ` Mike Anderson
2002-03-15 23:38         ` Randy.Dunlap
2002-03-16  0:02           ` Martin J. Bligh
2002-03-16  4:34             ` John Helms
2002-03-16  5:44               ` Martin J. Bligh
2002-03-18 21:45               ` Randy.Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox