2.6.23-rc8 network problem. Mem leak? ip1000a?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 2.6.23-rc8 network problem.  Mem leak?  ip1000a?
@ 2007-09-28  2:06 linux
  2007-09-28  9:20 ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: linux @ 2007-09-28  2:06 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: linux

Uniprocessor Althlon 64, 64-bit kernel, 2G ECC RAM,
2.6.23-rc8 + linuxpps (5.0.0) + ip1000a driver.
(patch from http://marc.info/?l=linux-netdev&m=118980588419882)

After a few hours of operation, ntp loses the ability to send packets.
sendto() returns -EAGAIN to everything, including the 24-byte UDP packet
that is a response to ntpq.

-EAGAIN on a sendto() makes me think of memory problems, so here's
meminfo at the time:

### FAILED state ###
# cat /proc/meminfo 
MemTotal:      2059384 kB
MemFree:         15332 kB
Buffers:        665608 kB
Cached:          18212 kB
SwapCached:          0 kB
Active:         380384 kB
Inactive:       355020 kB
SwapTotal:     5855208 kB
SwapFree:      5854552 kB
Dirty:           28504 kB
Writeback:           0 kB
AnonPages:       51608 kB
Mapped:          11852 kB
Slab:          1285348 kB
SReclaimable:   152968 kB
SUnreclaim:    1132380 kB
PageTables:       3888 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6884900 kB
Committed_AS:   590528 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    265628 kB
VmallocChunk: 34359472059 kB


Killing and restarting ntpd gets it running again for a few hours.
Here's after about two hours of successful operation.  (I'll try to
remember to run slabinfo before killing ntpd next time.)

### WORKING state ###
# cat /proc/meminfo
MemTotal:      2059384 kB
MemFree:         20252 kB
Buffers:        242688 kB
Cached:          41556 kB
SwapCached:        200 kB
Active:         285012 kB
Inactive:       147348 kB
SwapTotal:     5855208 kB
SwapFree:      5854212 kB
Dirty:              36 kB
Writeback:           0 kB
AnonPages:      148052 kB
Mapped:          12756 kB
Slab:          1582512 kB
SReclaimable:   134348 kB
SUnreclaim:    1448164 kB
PageTables:       4500 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6884900 kB
Committed_AS:   689956 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    265628 kB
VmallocChunk: 34359472059 kB
# /usr/src/linux/Documentation/vm/slabinfo
Name                   Objects Objsize    Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
:0000016                  1478      16    24.5K          6/3/1  256 0  50  96 *
:0000024                   170      24     4.0K          1/0/1  170 0   0  99 *
:0000032                  1339      32    45.0K         11/2/1  128 0  18  95 *
:0000040                   102      40     4.0K          1/0/1  102 0   0  99 *
:0000064                  5937      64   413.6K       101/15/1   64 0  14  91 *
:0000072                    56      72     4.0K          1/0/1   56 0   0  98 *
:0000088                  6946      88   618.4K        151/0/1   46 0   0  98 *
:0000096                 23851      96     2.5M      616/144/1   42 0  23  90 *
:0000128                   730     128   114.6K         28/6/1   32 0  21  81 *
:0000136                   232     136    36.8K          9/6/1   30 0  66  85 *
:0000192                   474     192    98.3K         24/4/1   21 0  16  92 *
:0000256               1385376     256   354.6M      86587/0/1   16 0   0  99 *
:0000320                    12     304     4.0K          1/0/1   12 0   0  89 *A
:0000384                   359     384   180.2K        44/23/1   10 0  52  76 *A
:0000512               1384316     512   708.7M     173040/1/1    8 0   0  99 *
:0000640                    72     616    53.2K         13/5/1    6 0  38  83 *A
:0000704                  1870     696     1.3M        170/0/1   11 1   0  93 *A
:0001024                   427    1024   454.6K        111/9/1    4 0   8  96 *
:0001472                   150    1472   245.7K         30/0/1    5 1   0  89 *
:0002048                158991    2048   325.7M     39759/25/1    4 1   0  99 *
:0004096                    51    4096   245.7K         30/9/1    2 1  30  85 *
Acpi-State                  51      80     4.0K          1/0/1   51 0   0  99 
anon_vma                  1032      16    28.6K          7/5/1  170 0  71  57 
bdev_cache                  43     720    36.8K          9/1/1    5 0  11  83 Aa
blkdev_requests             42     288    12.2K          3/0/1   14 0   0  98 
buffer_head              59173     104    11.1M    2734/1690/1   39 0  61  54 a
cfq_io_context             223     152    40.9K         10/6/1   26 0  60  82 
dentry                   98641     192    19.7M     4813/274/1   21 0   5  96 a
ext3_inode_cache        115690     688    86.3M     10545/77/1   11 1   0  92 a
file_lock_cache             23     168     4.0K          1/0/1   23 0   0  94 
idr_layer_cache            118     528    69.6K         17/1/1    7 0   5  89 
inode_cache               1365     528   798.7K        195/0/1    7 0   0  90 a
kmalloc-131072               1  131072   131.0K          1/0/1    1 5   0 100 
kmalloc-16384                8   16384   131.0K          8/0/1    1 2   0 100 
kmalloc-32768                1   32768    32.7K          1/0/1    1 3   0 100 
kmalloc-8                 1535       8    12.2K          3/1/1  512 0  33  99 
kmalloc-8192                10    8192    81.9K         10/0/1    1 1   0 100 
mm_struct                   54     800    57.3K          7/5/1    9 1  71  75 A
proc_inode_cache            12     560    16.3K          4/3/1    7 0  75  41 a
radix_tree_node          17076     552    13.5M    3319/1675/1    7 0  50  69 
raid5-md5                  258    1176   352.2K         43/0/1    6 1   0  86 
shmem_inode_cache           22     712    20.4K          5/1/1    5 0  20  76 
sighand_cache               88    2072   253.9K         31/3/1    3 1   9  71 A
signal_cache                88     720    77.8K         19/6/1    5 0  31  81 A
sigqueue                    25     160     4.0K          1/0/1   25 0   0  97 
skbuff_fclone_cache     158787     404    72.2M      17644/2/1    9 0   0  88 A
sock_inode_cache            65     600    53.2K         13/5/1    6 0  38  73 Aa
task_struct                145    1808   311.2K         38/6/1    4 1  15  84 
uhci_urb_priv               73      56     4.0K          1/0/1   73 0   0  99 
vm_area_struct            2947     168   540.6K       132/25/1   24 0  18  91 
### WORKING state ###


After quite a few hours (about 15h40m), it failed again.  I'm leaving it
in the failed state for now so I can answer questions.  ntpd is running and
receiving packets, it just can't send any.

### FAILED state ###
# cat /proc/meminfo
MemTotal:      2059384 kB
MemFree:         98556 kB
Buffers:          3224 kB
Cached:          18888 kB
SwapCached:       3876 kB
Active:          16688 kB
Inactive:        13068 kB
SwapTotal:     5855208 kB
SwapFree:      5822068 kB
Dirty:              32 kB
Writeback:           0 kB
AnonPages:        6960 kB
Mapped:           4752 kB
Slab:          1907828 kB
SReclaimable:     1916 kB
SUnreclaim:    1905912 kB
PageTables:       3888 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6884900 kB
Committed_AS:   575652 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    265628 kB
VmallocChunk: 34359472059 kB
# /usr/src/linux/Documentation/vm/slabinfo
Name                   Objects Objsize    Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
:0000016                  1533      16    32.7K          8/5/1  256 0  62  74 *
:0000024                   170      24     4.0K          1/0/1  170 0   0  99 *
:0000032                  1339      32    45.0K         11/2/1  128 0  18  95 *
:0000040                   102      40     4.0K          1/0/1  102 0   0  99 *
:0000064                  3295      64   278.5K        68/24/1   64 0  35  75 *
:0000072                    56      72     4.0K          1/0/1   56 0   0  98 *
:0000088                  6946      88   618.4K        151/0/1   46 0   0  98 *
:0000096                  2110      96   233.4K         57/9/1   42 0  15  86 *
:0000128                   726     128    98.3K         24/2/1   32 0   8  94 *
:0000136                   255     136    40.9K         10/6/1   30 0  60  84 *
:0000192                   457     192    98.3K         24/4/1   21 0  16  89 *
:0000256               1893104     256   484.6M     118319/0/1   16 0   0 100 *
:0000320                    12     304     4.0K          1/0/1   12 0   0  89 *A
:0000384                   423     384   188.4K        46/14/1   10 0  30  86 *A
:0000512               1892180     512   968.8M     236524/4/1    8 0   0  99 *
:0000640                    68     616    49.1K         12/4/1    6 0  33  85 *A
:0000704                  2043     696     1.5M        186/1/1   11 1   0  93 *A
:0001024                   435    1024   462.8K       113/11/1    4 0   9  96 *
:0001472                   240    1472   393.2K         48/0/1    5 1   0  89 *
:0002048                196191    2048   401.8M     49052/14/1    4 1   0  99 *
:0004096                    51    4096   237.5K         29/7/1    2 1  24  87 *
Acpi-State                  51      80     4.0K          1/0/1   51 0   0  99 
anon_vma                   796      16    24.5K          6/4/1  170 0  66  51 
bdev_cache                  43     720    36.8K          9/1/1    5 0  11  83 Aa
blkdev_requests             46     288    16.3K          4/1/1   14 0  25  80 
buffer_head                888     104    94.2K         23/2/1   39 0   8  98 a
cfq_io_context             249     152    45.0K         11/6/1   26 0  54  84 
dentry                    4242     192   831.4K        203/0/1   21 0   0  97 a
ext3_inode_cache          1341     688     1.0M       129/11/1   11 1   8  87 a
file_lock_cache             23     168     4.0K          1/0/1   23 0   0  94 
idr_layer_cache            118     528    69.6K         17/1/1    7 0   5  89 
inode_cache                959     528   565.2K        138/0/1    7 0   0  89 a
kmalloc-131072               1  131072   131.0K          1/0/1    1 5   0 100 
kmalloc-16384                8   16384   131.0K          8/0/1    1 2   0 100 
kmalloc-32768                1   32768    32.7K          1/0/1    1 3   0 100 
kmalloc-8                 1535       8    12.2K          3/1/1  512 0  33  99 
kmalloc-8192                10    8192    81.9K         10/0/1    1 1   0 100 
mm_struct                   54     800    65.5K          8/6/1    9 1  75  65 A
proc_inode_cache            42     560    24.5K          6/0/1    7 0   0  95 a
radix_tree_node            985     552   811.0K       198/73/1    7 0  36  67 
raid5-md5                  258    1176   352.2K         43/0/1    6 1   0  86 
shmem_inode_cache           24     712    20.4K          5/1/1    5 0  20  83 
sighand_cache               86    2072   237.5K         29/1/1    3 1   3  75 A
signal_cache                85     720    77.8K         19/8/1    5 0  42  78 A
sigqueue                    25     160     4.0K          1/0/1   25 0   0  97 
skbuff_fclone_cache     196031     404    89.2M      21782/5/1    9 0   0  88 A
sock_inode_cache            62     600    53.2K         13/6/1    6 0  46  69 Aa
task_struct                140    1808   311.2K         38/9/1    4 1  23  81 
uhci_urb_priv               73      56     4.0K          1/0/1   73 0   0  99 
vm_area_struct            2666     168   466.9K        114/7/1   24 0   6  95 
### FAILED state ###


I'm not sure quite where to point the blame.  This hardware used to
run ntpd just fine with a 2.6.22 kernel, but I upgraded the base kernel,
the ip1000a driver, and the linuxpps patches at the same time.
With this time to failure, bisection is a challenge.

I'm not quite sure how it could be the ip1000a driver's fault, in a way
that breaks ntp but leaves ssh and other network services running.

And the linuxpps patches are very localized and only allocate
at initialization time.  It's hard to see how they could cause
this effect.

There is a newly recompiled (new linuxpps API) ntpd, but it's hard to
see how it could cause the given symptoms, and the exact same source
code is running on a 32-bit machine (2.6.23-rc6 + linuxpps 5.0) just fine.

But it's also hard to imagine that I've found a new generic networking bug
that nobody else has noticed.

Can anyone offer some diagnosis advice?


This is actually the second (and third) time it's happened.  The first
time, I ran strace and saw the same -EBUSY, but assumed I'd misconfigured
ntpd and bounced it.  It started working, so I left it, then noticed
that it had stopped again and looked more closely.

FWIW, a "stuck" ntpd still responds to UDP queries from a localhost ntpd.

Here's the .config:
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_KMOD=y
CONFIG_BLOCK=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_X86_PC=y
CONFIG_MK8=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
CONFIG_PREEMPT_NONE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
CONFIG_IOMMU=y
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_AMD=y
CONFIG_PHYSICAL_START=0x200000
CONFIG_SECCOMP=y
CONFIG_HZ_250=y
CONFIG_HZ=250
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y
CONFIG_PM=y
CONFIG_SUSPEND_UP_POSSIBLE=y
CONFIG_HIBERNATION_UP_POSSIBLE=y
CONFIG_ACPI=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_X86_POWERNOW_K8=y
CONFIG_X86_POWERNOW_K8_ACPI=y
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_HT_IRQ=y
CONFIG_BINFMT_ELF=y
CONFIG_IA32_EMULATION=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
CONFIG_NET_KEY=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=y
CONFIG_INET_ESP=y
CONFIG_INET_IPCOMP=y
CONFIG_INET_XFRM_TUNNEL=y
CONFIG_INET_TUNNEL=y
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_NETFILTER=y
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
CONFIG_NETFILTER_XT_TARGET_MARK=y
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
CONFIG_NETFILTER_XT_MATCH_LENGTH=y
CONFIG_NETFILTER_XT_MATCH_LIMIT=y
CONFIG_NETFILTER_XT_MATCH_MARK=y
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
CONFIG_NETFILTER_XT_MATCH_TCPMSS=y
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
CONFIG_IP_NF_QUEUE=m
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_TOS=y
CONFIG_IP_NF_MATCH_ECN=y
CONFIG_IP_NF_MATCH_OWNER=y
CONFIG_IP_NF_MATCH_ADDRTYPE=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
CONFIG_IP_NF_TARGET_ULOG=y
CONFIG_IP_NF_MANGLE=y
CONFIG_IP_NF_TARGET_TOS=y
CONFIG_IP_NF_TARGET_ECN=y
CONFIG_IP_NF_TARGET_TTL=y
CONFIG_VLAN_8021Q=y
CONFIG_NET_SCHED=y
CONFIG_NET_SCH_FIFO=y
CONFIG_NET_SCH_CBQ=y
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_HFSC=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_RR=y
CONFIG_NET_SCH_RED=y
CONFIG_NET_SCH_SFQ=y
CONFIG_NET_SCH_TEQL=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_GRED=y
CONFIG_NET_SCH_DSMARK=y
CONFIG_NET_SCH_NETEM=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_CLS=y
CONFIG_NET_CLS_TCINDEX=y
CONFIG_NET_CLS_ROUTE4=y
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
CONFIG_NET_ACT_MIRRED=y
CONFIG_NET_ACT_IPT=y
CONFIG_NET_ACT_PEDIT=y
CONFIG_NET_ACT_SIMP=y
CONFIG_NET_CLS_POLICE=y
CONFIG_FIB_RULES=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_PC_FIFO=y
CONFIG_PARPORT_PC_SUPERIO=y
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=65536
CONFIG_BLK_DEV_RAM_BLOCKSIZE=4096
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=8
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDEACPI=y
CONFIG_IDE_GENERIC=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_AMD74XX=m
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_WAIT_SCAN=m
CONFIG_ATA=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_NV=y
CONFIG_SATA_SIL24=y
CONFIG_SATA_VIA=m
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=y
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_SBP2=m
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
CONFIG_TUN=y
CONFIG_IP1000=y
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
CONFIG_NET_TULIP=y
CONFIG_DE2104X=m
CONFIG_TULIP=m
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=m
CONFIG_NET_PCI=y
CONFIG_FORCEDETH=y
CONFIG_NETDEV_1000=y
CONFIG_SKGE=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_LIBPS2=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_PRINTER=y
CONFIG_RTC=y
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_DRM=y
CONFIG_DRM_RADEON=y
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
CONFIG_HANGCHECK_TIMER=y
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_ALGOBIT=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_VIAPRO=y
CONFIG_SENSORS_EEPROM=y
CONFIG_PPS=y
CONFIG_PPS_CLIENT_UART=y
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
CONFIG_SENSORS_ABITUGURU=y
CONFIG_SENSORS_K8TEMP=y
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_W83627HF=m
CONFIG_VGA_CONSOLE=y
CONFIG_VIDEO_SELECT=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_MPU401_UART=m
CONFIG_SND_AC97_CODEC=m
CONFIG_SND_VIA82XX=m
CONFIG_AC97_BUS=m
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_USB_HID=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
CONFIG_USB_DEVICEFS=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_SPLIT_ISO=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_OHCI_HCD=m
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
CONFIG_USB_PRINTER=m
CONFIG_USB_STORAGE=m
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_GENERIC=y
CONFIG_USB_SERIAL_BELKIN=m
CONFIG_USB_SERIAL_WHITEHEAT=m
CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m
CONFIG_USB_SERIAL_CP2101=m
CONFIG_USB_SERIAL_CYPRESS_M8=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_SERIAL_MCT_U232=m
CONFIG_USB_SERIAL_MOS7720=m
CONFIG_USB_SERIAL_MOS7840=m
CONFIG_USB_SERIAL_PL2303=m
CONFIG_USB_SERIAL_TI=m
CONFIG_USB_SERIAL_XIRCOM=m
CONFIG_USB_EZUSB=y
CONFIG_EDAC=y
CONFIG_EDAC_MM_EDAC=y
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
CONFIG_RTC_DRV_CMOS=y
CONFIG_DMIID=y
CONFIG_EXT2_FS=m
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_DNOTIFY=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="cp437"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=y
CONFIG_ASYNC_MEMCPY=y
CONFIG_ASYNC_XOR=y
CONFIG_CRYPTO=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_X86_64=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_X86_64=y
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_DEFLATE=y
CONFIG_BITREVERSE=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem.  Mem leak?  ip1000a?
  2007-09-28  2:06 2.6.23-rc8 network problem. Mem leak? ip1000a? linux
@ 2007-09-28  9:20 ` Andrew Morton
  2007-09-30  7:59   ` linux
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2007-09-28  9:20 UTC (permalink / raw)
  To: linux; +Cc: linux-kernel, netdev

On 27 Sep 2007 22:06:17 -0400 linux@horizon.com wrote:

> Uniprocessor Althlon 64, 64-bit kernel, 2G ECC RAM,
> 2.6.23-rc8 + linuxpps (5.0.0) + ip1000a driver.
> (patch from http://marc.info/?l=linux-netdev&m=118980588419882)
> 
> After a few hours of operation, ntp loses the ability to send packets.
> sendto() returns -EAGAIN to everything, including the 24-byte UDP packet
> that is a response to ntpq.
> 
> ...
>
> Killing and restarting ntpd gets it running again for a few hours.
> Here's after about two hours of successful operation.  (I'll try to
> remember to run slabinfo before killing ntpd next time.)

ntpd.  Sounds like pps leaking to me.

> 
> Can anyone offer some diagnosis advice?
> 

CONFIG_DEBUG_SLAB_LEAK?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem.  Mem leak?  ip1000a?
  2007-09-28  9:20 ` Andrew Morton
@ 2007-09-30  7:59   ` linux
  2007-09-30  9:23     ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: linux @ 2007-09-30  7:59 UTC (permalink / raw)
  To: akpm, linux; +Cc: linux-kernel, netdev

> ntpd.  Sounds like pps leaking to me.

That's what I'd think, except that pps does no allocation in the normal
running state, so there's nothing to leak.  The interrupt path just
records the time in some preallocated, static buffers and wakes up
blocked readers.  The read path copies the latest data out of those
static buffers.  There's allocation when the PPS device is created,
and more when it's opened.

>> Can anyone offer some diagnosis advice?

> CONFIG_DEBUG_SLAB_LEAK?

Ah, thanks you; I've been using SLUB which doesn't support this option.
Here's what I've extracted.  I've only presented the top few
slab_allocators and a small subset of the oom-killer messages, but I
have full copies if desired.  Unfortunately, I've discovered that the
machine doesn't live in this unhappy state forever.  Indeed, I'm not
sure if killing ntpd "fixes" anything; my previous observations
may have been optimistic ignorance.

(For my own personal reference looking for more oom-kill, I nuked ntpd
at 06:46:56.  And the oom-kills are continuing, with the latest at
07:43:52.)

Anyway, I have a bunch of information from the slab_allocators file, but
I'm not quire sure how to make sense of it.


With a machine in the unhappy state and firing the OOM killer, the top
20 slab_allocators are:
$ sort -rnk2 /proc/slab_allocators | head -20
skbuff_head_cache: 1712746 __alloc_skb+0x31/0x121
size-512: 1706572 tcp_send_ack+0x23/0x102
skbuff_fclone_cache: 149113 __alloc_skb+0x31/0x121
size-2048: 148500 tcp_sendmsg+0x1b5/0xae1
sysfs_dir_cache: 5289 sysfs_new_dirent+0x4b/0xec
size-512: 2613 sock_alloc_send_skb+0x93/0x1dd
Acpi-Operand: 2014 acpi_ut_allocate_object_desc_dbg+0x34/0x6e
size-32: 1995 sysfs_new_dirent+0x29/0xec
vm_area_struct: 1679 mmap_region+0x18f/0x421
size-512: 1618 tcp_xmit_probe_skb+0x1f/0xcd
size-512: 1571 arp_create+0x4e/0x1cd
vm_area_struct: 1544 copy_process+0x9f1/0x1108
anon_vma: 1448 anon_vma_prepare+0x29/0x74
filp: 1201 get_empty_filp+0x44/0xcd
UDP: 1173 sk_alloc+0x25/0xaf
size-128: 1048 r1bio_pool_alloc+0x23/0x3b
size-128: 1024 nfsd_cache_init+0x2d/0xcf
Acpi-Namespace: 973 acpi_ns_create_node+0x2c/0x45
vm_area_struct: 717 split_vma+0x33/0xe5
dentry: 594 d_alloc+0x24/0x177

I'm not sure quite what "normal" numbers are, but I do wonder why there
are 1.7 million TCP acks buffered in the system.  Shouldn't they be
transmitted and deallocated pretty quickly?

This machine receives more data than it sends, so I'd expect acks to
outnumber "real" packets.  Could the ip1000a driver's transmit path be
leaking skbs somehow?  that would also explain the "flailing" of the
oom-killer; it can't associate the allocations with a process.

Here's /proc/meminfo:
MemTotal:      1035756 kB
MemFree:         43508 kB
Buffers:         72920 kB
Cached:         224056 kB
SwapCached:     344916 kB
Active:         664976 kB
Inactive:       267656 kB
SwapTotal:     4950368 kB
SwapFree:      3729384 kB
Dirty:            6460 kB
Writeback:           0 kB
AnonPages:      491708 kB
Mapped:          79232 kB
Slab:            41324 kB
SReclaimable:    25008 kB
SUnreclaim:      16316 kB
PageTables:       8132 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   5468244 kB
Committed_AS:  1946008 kB
VmallocTotal:   253900 kB
VmallocUsed:      2672 kB
VmallocChunk:   251228 kB

I have a lot of oom-killer messages, that I have saved but am not
posting for size reasons, but here are some example backtraces.
They're not very helpful to me; do they enlighten anyone else?

02:50:20: apcupsd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
02:50:22: 
02:50:22: Call Trace:
02:50:22:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
02:50:22:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
02:50:22:  [<ffffffff8025cbd6>] cache_alloc_refill+0x2f4/0x60a
02:50:22:  [<ffffffff8040602c>] hiddev_ioctl+0x579/0x919
02:50:22:  [<ffffffff8025d0fc>] kmem_cache_alloc+0x57/0x95
02:50:22:  [<ffffffff8040602c>] hiddev_ioctl+0x579/0x919
02:50:22:  [<ffffffff80262511>] cp_new_stat+0xe5/0xfd
02:50:22:  [<ffffffff804058ff>] hiddev_read+0x199/0x1f6
02:50:22:  [<ffffffff80222fa0>] default_wake_function+0x0/0xe
02:50:22:  [<ffffffff80269bb5>] do_ioctl+0x45/0x50
02:50:22:  [<ffffffff80269db9>] vfs_ioctl+0x1f9/0x20b
02:50:22:  [<ffffffff80269e07>] sys_ioctl+0x3c/0x5d
02:50:22:  [<ffffffff8020b43e>] system_call+0x7e/0x83

02:52:18: postgres invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
02:52:18: 
02:52:18: Call Trace:
02:52:18:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
02:52:18:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
02:52:18:  [<ffffffff8025be8a>] poison_obj+0x26/0x2f
02:52:18:  [<ffffffff8024761f>] __get_free_pages+0x40/0x79
02:52:18:  [<ffffffff80224d66>] copy_process+0xb0/0x1108
02:52:18:  [<ffffffff80233388>] alloc_pid+0x1f/0x27d
02:52:18:  [<ffffffff80225ed6>] do_fork+0xb1/0x1a7
02:52:18:  [<ffffffff802f0627>] copy_user_generic_string+0x17/0x40
02:52:18:  [<ffffffff8020b43e>] system_call+0x7e/0x83
02:52:18:  [<ffffffff8020b757>] ptregscall_common+0x67/0xb0

02:52:18: kthreadd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
02:52:18: 
02:52:18: Call Trace:
02:52:18:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
02:52:18:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
02:52:18:  [<ffffffff8024761f>] __get_free_pages+0x40/0x79
02:52:18:  [<ffffffff80224d66>] copy_process+0xb0/0x1108
02:52:18:  [<ffffffff80233388>] alloc_pid+0x1f/0x27d
02:52:18:  [<ffffffff80225ed6>] do_fork+0xb1/0x1a7
02:52:18:  [<ffffffff80222bb8>] update_curr+0xe6/0x10b
02:52:18:  [<ffffffff8022334d>] dequeue_entity+0x73/0x97
02:52:18:  [<ffffffff8020bd21>] kernel_thread+0x81/0xde
02:52:18:  [<ffffffff802e9c81>] cfq_may_queue+0x0/0xd2
02:52:18:  [<ffffffff802355db>] kthread+0x0/0x75
02:52:18:  [<ffffffff8020bd7e>] child_rip+0x0/0x12
02:52:18:  [<ffffffff802354ad>] kthreadd+0xb4/0xf5
02:52:18:  [<ffffffff8020bd88>] child_rip+0xa/0x12
02:52:18:  [<ffffffff802353f9>] kthreadd+0x0/0xf5
02:52:18:  [<ffffffff8020bd7e>] child_rip+0x0/0x12

02:54:53: apache2 invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
02:54:53: 
02:54:53: Call Trace:
02:54:53:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
02:54:53:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
02:54:53:  [<ffffffff8025be8a>] poison_obj+0x26/0x2f
02:54:53:  [<ffffffff8024761f>] __get_free_pages+0x40/0x79
02:54:53:  [<ffffffff80224d66>] copy_process+0xb0/0x1108
02:54:53:  [<ffffffff80233388>] alloc_pid+0x1f/0x27d
02:54:53:  [<ffffffff80225ed6>] do_fork+0xb1/0x1a7
02:54:53:  [<ffffffff8020b43e>] system_call+0x7e/0x83
02:54:53:  [<ffffffff8020b757>] ptregscall_common+0x67/0xb0

02:55:45: ssh invoked oom-killer: gfp_mask=0x4d0, order=2, oomkilladj=0
02:55:45: 
02:55:45: Call Trace:
02:55:45:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
02:55:45:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
02:55:45:  [<ffffffff8025cbd6>] cache_alloc_refill+0x2f4/0x60a
02:55:45:  [<ffffffff8025be8a>] poison_obj+0x26/0x2f
02:55:45:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
02:55:45:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
02:55:45:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
02:55:45:  [<ffffffff8025d067>] __kmalloc_track_caller+0x9d/0xdb
02:55:45:  [<ffffffff8040e109>] __alloc_skb+0x5b/0x121
02:55:45:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
02:55:45:  [<ffffffff802f0627>] copy_user_generic_string+0x17/0x40
02:55:45:  [<ffffffff8048155c>] unix_stream_sendmsg+0x151/0x2ea
02:55:45:  [<ffffffff80408349>] sock_aio_write+0xe5/0xf0
02:55:45:  [<ffffffff802437d3>] find_get_page+0xe/0x36
02:55:45:  [<ffffffff8025f9b4>] do_sync_write+0xd1/0x118
02:55:45:  [<ffffffff802357f5>] autoremove_wake_function+0x0/0x2e
02:55:45:  [<ffffffff80222bb8>] update_curr+0xe6/0x10b
02:55:45:  [<ffffffff80260118>] vfs_write+0xc0/0x136
02:55:45:  [<ffffffff802605c2>] sys_write+0x45/0x6e
02:55:45:  [<ffffffff8020b43e>] system_call+0x7e/0x83

03:01:34: smbclient invoked oom-killer: gfp_mask=0x4d0, order=2, oomkilladj=0
03:01:34: 
03:01:34: Call Trace:
03:01:34:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
03:01:34:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
03:01:34:  [<ffffffff8025cbd6>] cache_alloc_refill+0x2f4/0x60a
03:01:34:  [<ffffffff8025be8a>] poison_obj+0x26/0x2f
03:01:34:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
03:01:34:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
03:01:34:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
03:01:34:  [<ffffffff8025d067>] __kmalloc_track_caller+0x9d/0xdb
03:01:34:  [<ffffffff8040e109>] __alloc_skb+0x5b/0x121
03:01:34:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
03:01:34:  [<ffffffff8040a8ab>] release_sock+0xe/0x7f
03:01:34:  [<ffffffff8048155c>] unix_stream_sendmsg+0x151/0x2ea
03:01:34:  [<ffffffff80408349>] sock_aio_write+0xe5/0xf0
03:01:34:  [<ffffffff8025f9b4>] do_sync_write+0xd1/0x118
03:01:34:  [<ffffffff802357f5>] autoremove_wake_function+0x0/0x2e
03:01:34:  [<ffffffff80222bb8>] update_curr+0xe6/0x10b
03:01:34:  [<ffffffff80260118>] vfs_write+0xc0/0x136
03:01:34:  [<ffffffff802605c2>] sys_write+0x45/0x6e
03:01:34:  [<ffffffff8020b43e>] system_call+0x7e/0x83

05:48:04: scp invoked oom-killer: gfp_mask=0x4d0, order=1, oomkilladj=0
05:48:04: 
05:48:04: Call Trace:
05:48:04:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
05:48:04:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
05:48:04:  [<ffffffff8025cbd6>] cache_alloc_refill+0x2f4/0x60a
05:48:04:  [<ffffffff8025be8a>] poison_obj+0x26/0x2f
05:48:04:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
05:48:04:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
05:48:04:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
05:48:04:  [<ffffffff8025d067>] __kmalloc_track_caller+0x9d/0xdb
05:48:04:  [<ffffffff8040e109>] __alloc_skb+0x5b/0x121
05:48:04:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
05:48:04:  [<ffffffff8048155c>] unix_stream_sendmsg+0x151/0x2ea
05:48:04:  [<ffffffff80243645>] file_read_actor+0x0/0x118
05:48:04:  [<ffffffff80408349>] sock_aio_write+0xe5/0xf0
05:48:04:  [<ffffffff8025f9b4>] do_sync_write+0xd1/0x118
05:48:04:  [<ffffffff802357f5>] autoremove_wake_function+0x0/0x2e
05:48:04:  [<ffffffff80260118>] vfs_write+0xc0/0x136
05:48:04:  [<ffffffff802605c2>] sys_write+0x45/0x6e
05:48:04:  [<ffffffff8020b43e>] system_call+0x7e/0x83

And here's the latest one, in full:

05:48:11: smbclient invoked oom-killer: gfp_mask=0x4d0, order=2, oomkilladj=0
05:48:12: 
05:48:12: Call Trace:
05:48:12:  [<ffffffff80246053>] out_of_memory+0x71/0x1ba
05:48:12:  [<ffffffff8024755d>] __alloc_pages+0x255/0x2d7
05:48:12:  [<ffffffff8025cbd6>] cache_alloc_refill+0x2f4/0x60a
05:48:12:  [<ffffffff8025be8a>] poison_obj+0x26/0x2f
05:48:12:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
05:48:12:  [<ffffffff8040e0df>] __alloc_skb+0x31/0x121
05:48:12:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
05:48:12:  [<ffffffff8025d067>] __kmalloc_track_caller+0x9d/0xdb
05:48:12:  [<ffffffff8040e109>] __alloc_skb+0x5b/0x121
05:48:12:  [<ffffffff8040ab8b>] sock_alloc_send_skb+0x93/0x1dd
05:48:12:  [<ffffffff802f0627>] copy_user_generic_string+0x17/0x40
05:48:12:  [<ffffffff8048155c>] unix_stream_sendmsg+0x151/0x2ea
05:48:12:  [<ffffffff80408349>] sock_aio_write+0xe5/0xf0
05:48:12:  [<ffffffff8025f9b4>] do_sync_write+0xd1/0x118
05:48:12:  [<ffffffff802357f5>] autoremove_wake_function+0x0/0x2e
05:48:12:  [<ffffffff80222bb8>] update_curr+0xe6/0x10b
05:48:12:  [<ffffffff80260118>] vfs_write+0xc0/0x136
05:48:12:  [<ffffffff802605c2>] sys_write+0x45/0x6e
05:48:12:  [<ffffffff8020b43e>] system_call+0x7e/0x83
05:48:12: 
05:48:12: Mem-info:
05:48:12: DMA per-cpu:
05:48:12: CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
05:48:12: DMA32 per-cpu:
05:48:12: CPU    0: Hot: hi:  186, btch:  31 usd:  10   Cold: hi:   62, btch:  15 usd:  58
05:48:12: Active:67 inactive:1197 dirty:0 writeback:779 unstable:0
05:48:12:  free:39163 slab:464538 mapped:518 pagetables:1800 bounce:0
05:48:12: DMA free:8040kB min:28kB low:32kB high:40kB active:0kB inactive:0kB present:11132kB pages_scanned:0 all_unreclaimable? yes
05:48:12: lowmem_reserve[]: 0 2003 2003 2003
05:48:12: DMA32 free:148612kB min:5712kB low:7140kB high:8568kB active:268kB inactive:4788kB present:2051184kB pages_scanned:7841 all_unreclaimable? yes
05:48:12: lowmem_reserve[]: 0 0 0 0
05:48:12: DMA: 56*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8040kB
05:48:12: DMA32: 36459*4kB 55*8kB 2*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 148612kB
05:48:12: Swap cache: add 6853288, delete 6852498, find 1146430/2641691, race 0+0
05:48:12: Free swap  = 5213316kB
05:48:12: Total swap = 5855208kB
05:48:12: Free swap:       5213316kB
05:48:12: 524000 pages of RAM
05:48:12: 11264 reserved pages
05:48:12: 241 pages shared
05:48:12: 790 pages swap cached
05:48:12: Out of memory: kill process 9156 (apache2) score 83536 or a child
05:48:12: Killed process 9156 (apache2)

Oh, FWIW, a later snapshot of /proc/alab_allocators:

skbuff_head_cache: 1746940 __alloc_skb+0x31/0x121
size-512: 1740532 tcp_send_ack+0x23/0x102
skbuff_fclone_cache: 152230 __alloc_skb+0x31/0x121
size-2048: 151603 tcp_sendmsg+0x1b5/0xae1
sysfs_dir_cache: 5279 sysfs_new_dirent+0x4b/0xec
size-512: 2837 sock_alloc_send_skb+0x93/0x1dd
Acpi-Operand: 2014 acpi_ut_allocate_object_desc_dbg+0x34/0x6e
size-32: 1989 sysfs_new_dirent+0x29/0xec
size-512: 1678 arp_create+0x4e/0x1cd
size-512: 1619 tcp_xmit_probe_skb+0x1f/0xcd
UDP: 1217 sk_alloc+0x25/0xaf
size-128: 1024 r1bio_pool_alloc+0x23/0x3b
size-128: 1024 nfsd_cache_init+0x2d/0xcf
Acpi-Namespace: 973 acpi_ns_create_node+0x2c/0x45
vm_area_struct: 804 copy_process+0x9f1/0x1108
dentry: 488 d_alloc+0x24/0x177
size-2048: 480 tcp_fragment+0xdf/0x4aa
anon_vma: 463 anon_vma_prepare+0x29/0x74
filp: 442 get_empty_filp+0x44/0xcd
ip_dst_cache: 421 dst_alloc+0x29/0x76


I'm backing out the ip1000a driver and seeing what happens.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem.  Mem leak?  ip1000a?
  2007-09-30  7:59   ` linux
@ 2007-09-30  9:23     ` Andrew Morton
  2007-09-30 11:40       ` linux
  2008-01-08  6:52       ` linux
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2007-09-30  9:23 UTC (permalink / raw)
  To: linux; +Cc: linux-kernel, netdev, Francois Romieu

On 30 Sep 2007 03:59:56 -0400 linux@horizon.com wrote:

> > ntpd.  Sounds like pps leaking to me.
> 
> That's what I'd think, except that pps does no allocation in the normal
> running state, so there's nothing to leak.  The interrupt path just
> records the time in some preallocated, static buffers and wakes up
> blocked readers.  The read path copies the latest data out of those
> static buffers.  There's allocation when the PPS device is created,
> and more when it's opened.

OK.  Did you try to reproduce it without the pps patch applied?

> >> Can anyone offer some diagnosis advice?
> 
> > CONFIG_DEBUG_SLAB_LEAK?
> 
> Ah, thanks you; I've been using SLUB which doesn't support this option.
> Here's what I've extracted.  I've only presented the top few
> slab_allocators and a small subset of the oom-killer messages, but I
> have full copies if desired.  Unfortunately, I've discovered that the
> machine doesn't live in this unhappy state forever.  Indeed, I'm not
> sure if killing ntpd "fixes" anything; my previous observations
> may have been optimistic ignorance.
> 
> (For my own personal reference looking for more oom-kill, I nuked ntpd
> at 06:46:56.  And the oom-kills are continuing, with the latest at
> 07:43:52.)
> 
> Anyway, I have a bunch of information from the slab_allocators file, but
> I'm not quire sure how to make sense of it.
> 
> 
> With a machine in the unhappy state and firing the OOM killer, the top
> 20 slab_allocators are:
> $ sort -rnk2 /proc/slab_allocators | head -20
> skbuff_head_cache: 1712746 __alloc_skb+0x31/0x121
> size-512: 1706572 tcp_send_ack+0x23/0x102
> skbuff_fclone_cache: 149113 __alloc_skb+0x31/0x121
> size-2048: 148500 tcp_sendmsg+0x1b5/0xae1
> sysfs_dir_cache: 5289 sysfs_new_dirent+0x4b/0xec
> size-512: 2613 sock_alloc_send_skb+0x93/0x1dd
> Acpi-Operand: 2014 acpi_ut_allocate_object_desc_dbg+0x34/0x6e
> size-32: 1995 sysfs_new_dirent+0x29/0xec
> vm_area_struct: 1679 mmap_region+0x18f/0x421
> size-512: 1618 tcp_xmit_probe_skb+0x1f/0xcd
> size-512: 1571 arp_create+0x4e/0x1cd
> vm_area_struct: 1544 copy_process+0x9f1/0x1108
> anon_vma: 1448 anon_vma_prepare+0x29/0x74
> filp: 1201 get_empty_filp+0x44/0xcd
> UDP: 1173 sk_alloc+0x25/0xaf
> size-128: 1048 r1bio_pool_alloc+0x23/0x3b
> size-128: 1024 nfsd_cache_init+0x2d/0xcf
> Acpi-Namespace: 973 acpi_ns_create_node+0x2c/0x45
> vm_area_struct: 717 split_vma+0x33/0xe5
> dentry: 594 d_alloc+0x24/0x177
> 
> I'm not sure quite what "normal" numbers are, but I do wonder why there
> are 1.7 million TCP acks buffered in the system.  Shouldn't they be
> transmitted and deallocated pretty quickly?

Yeah, that's an skbuff leak.

> This machine receives more data than it sends, so I'd expect acks to
> outnumber "real" packets.  Could the ip1000a driver's transmit path be
> leaking skbs somehow?

Absolutely.  Normally a driver's transmit completion interrupt handler will
run dev_kfree_skb_irq() against the skbs which have been fully sent.

However it'd be darned odd if the driver was leaking only tcp acks.

I can find no occurrence of "dev_kfree_skb" in drivers/net/ipg.c, which is
suspicious.

Where did you get your ipg.c from, btw?  davem's tree?  rc8-mm1? rc8-mm2??

>  that would also explain the "flailing" of the
> oom-killer; it can't associate the allocations with a process.
> 
> Here's /proc/meminfo:
> MemTotal:      1035756 kB
> MemFree:         43508 kB
> Buffers:         72920 kB
> Cached:         224056 kB
> SwapCached:     344916 kB
> Active:         664976 kB
> Inactive:       267656 kB
> SwapTotal:     4950368 kB
> SwapFree:      3729384 kB
> Dirty:            6460 kB
> Writeback:           0 kB
> AnonPages:      491708 kB
> Mapped:          79232 kB
> Slab:            41324 kB
> SReclaimable:    25008 kB
> SUnreclaim:      16316 kB
> PageTables:       8132 kB
> NFS_Unstable:        0 kB
> Bounce:              0 kB
> CommitLimit:   5468244 kB
> Committed_AS:  1946008 kB
> VmallocTotal:   253900 kB
> VmallocUsed:      2672 kB
> VmallocChunk:   251228 kB

I assume that meminfo was not captured when the system was ooming?  There
isn't much slab there.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem.  Mem leak?  ip1000a?
  2007-09-30  9:23     ` Andrew Morton
@ 2007-09-30 11:40       ` linux
  2008-01-08  6:52       ` linux
  1 sibling, 0 replies; 20+ messages in thread
From: linux @ 2007-09-30 11:40 UTC (permalink / raw)
  To: akpm, linux; +Cc: jesse, linux-kernel, netdev, romieu, s.l-h

> OK.  Did you try to reproduce it without the pps patch applied?

No.  But I've yanked the ip1000a driver (using old crufy vendor-supplied
out-of-kernel module) and the problems are GONE.

>> This machine receives more data than it sends, so I'd expect acks to
>> outnumber "real" packets.  Could the ip1000a driver's transmit path be
>> leaking skbs somehow?

> Absolutely.  Normally a driver's transmit completion interrupt handler will
> run dev_kfree_skb_irq() against the skbs which have been fully sent.
>
> However it'd be darned odd if the driver was leaking only tcp acks.

It's leaking lots of things... you can see ARP packets in there and
all sorts of stuff.  But the big traffic hog is BackupPC doing inbound
rsyncs all night long, which generates a lot of acks.  Those are the
packets it sends, so those are the packets that get leaked.

> I can find no occurrence of "dev_kfree_skb" in drivers/net/ipg.c, which is
> suspicious.

Look for "IPG_DEV_KFREE_SKB", which is a wrapper macro.  (Or just add
"-i" to your grep.)  It should probably be deleted (it just expands to
dev_kfree_skb), but was presumably useful to someone during development.

> Where did you get your ipg.c from, btw?  davem's tree?  rc8-mm1? rc8-mm2??

As I wrote originally, I got it from
http://marc.info/?l=linux-netdev&m=118980588419882
which was a reuqest for mainline submission.

If there are other patches floating around, I'm happy to try them.
Now that I know what to look for, it's easy to spot the leak before OOM.

> I assume that meminfo was not captured when the system was ooming?  There
> isn't much slab there.

Oops, sorry.  I captured slabinfo but not meminfo.

Thank you very much!  Sorry to jump the gun and post a lot before I had
all the data, but if it WAS a problem in -rc8, I wanted to mention it
before -final.

Now, the rush is to get the ip1000a driver fixed before the merge
window opens.  I've added all the ip1000a developers to the Cc: list in
an attempt to speed that up.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem.  Mem leak?  ip1000a?
  2007-09-30  9:23     ` Andrew Morton
  2007-09-30 11:40       ` linux
@ 2008-01-08  6:52       ` linux
  2008-01-08  7:07         ` David Miller
  1 sibling, 1 reply; 20+ messages in thread
From: linux @ 2008-01-08  6:52 UTC (permalink / raw)
  To: akpm, netdev, romieu; +Cc: linux

Just to keep the issue open, drivers/net/ipg.c currently in 2.6.24-rc6
still leaks skbuffs like a sieve.  Run it for a few hours with network
traffic and the machine swaps like crazy while the oom killer goes nuts.

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index d9107e5..4fa392c 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -172,6 +172,10 @@ config IP1000
 	select MII
 	---help---
 	  This driver supports IP1000 gigabit Ethernet cards.
+	  It works, but suffers from a memory leak.  Signifcant
+	  use will consume unswappable kernel memory until the
+	  machine runs out of memory and crashes.  Thus, this
+	  driver cannot be considered usable at the the present time.

 	  To compile this driver as a module, choose M here: the module
 	  will be called ipg.  This is recommended.

Or should it be demoted to BROKEN?  It compiles, and sends and receives
packets, which is better than a lot of BROKEN drivers.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
  2008-01-08  6:52       ` linux
@ 2008-01-08  7:07         ` David Miller
  2008-01-08  7:14           ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2008-01-08  7:07 UTC (permalink / raw)
  To: linux; +Cc: akpm, netdev, romieu

From: linux@horizon.com
Date: 8 Jan 2008 01:52:11 -0500

> @@ -172,6 +172,10 @@ config IP1000
>  	select MII
>  	---help---
>  	  This driver supports IP1000 gigabit Ethernet cards.
> +	  It works, but suffers from a memory leak.  Signifcant
> +	  use will consume unswappable kernel memory until the
> +	  machine runs out of memory and crashes.  Thus, this
> +	  driver cannot be considered usable at the the present time.

This is not how we handle and track bugs.

Such a patch is inappropriate, and I'd like to ask that you just be
patient until someone has a chance to try and figure out what the
problem is.  Or even better, you can try to track down the problem
yourself since you seem to have a specific interest in this problem.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
  2008-01-08  7:07         ` David Miller
@ 2008-01-08  7:14           ` David Miller
  2008-01-08  7:51             ` Francois Romieu
  2008-01-08 12:28             ` [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak linux
  0 siblings, 2 replies; 20+ messages in thread
From: David Miller @ 2008-01-08  7:14 UTC (permalink / raw)
  To: linux; +Cc: akpm, netdev, romieu

From: David Miller <davem@davemloft.net>
Date: Mon, 07 Jan 2008 23:07:09 -0800 (PST)

> From: linux@horizon.com
> Date: 8 Jan 2008 01:52:11 -0500
> 
> > @@ -172,6 +172,10 @@ config IP1000
> >  	select MII
> >  	---help---
> >  	  This driver supports IP1000 gigabit Ethernet cards.
> > +	  It works, but suffers from a memory leak.  Signifcant
> > +	  use will consume unswappable kernel memory until the
> > +	  machine runs out of memory and crashes.  Thus, this
> > +	  driver cannot be considered usable at the the present time.
> 
> This is not how we handle and track bugs.
> 
> Such a patch is inappropriate, and I'd like to ask that you just be
> patient until someone has a chance to try and figure out what the
> problem is.  Or even better, you can try to track down the problem
> yourself since you seem to have a specific interest in this problem.

Actually, the bug is amazingly obvious after a quick scan of this
driver.

ipg_nic_rx_free_skb() is called from various places and is given zero
context to work with.  It assumes that the caller wants
"sp->rx_current % IPG_RFCLIST_LENGTH" to be freed.

But that's not right in most cases.  For example, consider the call in
ipg_nic_rx_with_end().  This function is invoked from ipg_nic_rx()
like so:

	unsigned int curr = sp->rx_current;
 ...
	for (i = 0; i < IPG_MAXRFDPROCESS_COUNT; i++, curr++) {
		unsigned int entry = curr % IPG_RFDLIST_LENGTH;
		struct ipg_rx *rxfd = sp->rxd + entry;

		if (!(rxfd->rfs & le64_to_cpu(IPG_RFS_RFDDONE)))
			break;

		switch (ipg_nic_rx_check_frame_type(dev)) {
 ...
		case Frame_WithEnd:
			ipg_nic_rx_with_end(dev, tp, rxfd, entry);
			break;
 ...
		}
	}

	sp->rx_current = curr;

So sp->rx_current does not correspond to the packet being processed
currently, so ipg_nic_rx_free_skb() will only look at and try to free
only the first packet the above loop tries to processe.

WOW!!!!  Amazing!!!

I invested 30 seconds of code reading to figure out the leak.  A much
better investment of time than adding bogus comments to the Kconfig
help text don't you think? :-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
  2008-01-08  7:14           ` David Miller
@ 2008-01-08  7:51             ` Francois Romieu
  2008-01-08 12:28             ` [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak linux
  1 sibling, 0 replies; 20+ messages in thread
From: Francois Romieu @ 2008-01-08  7:51 UTC (permalink / raw)
  To: David Miller; +Cc: linux, akpm, netdev

David Miller <davem@davemloft.net> :
[...]
> I invested 30 seconds of code reading to figure out the leak.  A much
> better investment of time than adding bogus comments to the Kconfig
> help text don't you think? :-)

Thanks for the hint David.

I'll roll up a patch for it after the day work.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-08  7:14           ` David Miller
  2008-01-08  7:51             ` Francois Romieu
@ 2008-01-08 12:28             ` linux
  2008-01-08 13:19               ` linux
  1 sibling, 1 reply; 20+ messages in thread
From: linux @ 2008-01-08 12:28 UTC (permalink / raw)
  To: netdev, romieu; +Cc: akpm, davem, linux

Prompted by davem, this attempt at fixing the memory leak
actually appears to work.  At least, leaving ping -f -s1472 -l64
running doesn't drop packets and doesn't show up in /proc/slabinfo.
---
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index dbd23bb..a0dfba5 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1110,10 +1110,9 @@ enum {
 	Frame_WithStart_WithEnd = 11
 };
 
-inline void ipg_nic_rx_free_skb(struct net_device *dev)
+inline void ipg_nic_rx_free_skb(struct net_device *dev, unsigned entry)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
-	unsigned int entry = sp->rx_current % IPG_RFDLIST_LENGTH;
 
 	if (sp->RxBuff[entry]) {
 		struct ipg_rx *rxfd = sp->rxd + entry;
@@ -1308,7 +1307,7 @@ static void ipg_nic_rx_with_end(struct net_device *dev,
 		jumbo->CurrentSize = 0;
 		jumbo->skb = NULL;
 
-		ipg_nic_rx_free_skb(dev);
+		ipg_nic_rx_free_skb(dev, entry);
 	} else {
 		IPG_DEV_KFREE_SKB(jumbo->skb);
 		jumbo->FoundStart = 0;
@@ -1337,7 +1336,7 @@ static void ipg_nic_rx_no_start_no_end(struct net_device *dev,
 				}
 			}
 			dev->last_rx = jiffies;
-			ipg_nic_rx_free_skb(dev);
+			ipg_nic_rx_free_skb(dev, entry);
 		}
 	} else {
 		IPG_DEV_KFREE_SKB(jumbo->skb);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-08 12:28             ` [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak linux
@ 2008-01-08 13:19               ` linux
  2008-01-08 21:36                 ` Francois Romieu
  0 siblings, 1 reply; 20+ messages in thread
From: linux @ 2008-01-08 13:19 UTC (permalink / raw)
  To: netdev, romieu; +Cc: akpm, davem, linux

I take that back.  This patch does NOT fix the leak, at least if
ping: sendmsg: No buffer space available
is any indication...

I think I was reading slabinfo wrong.
kmalloc-2048       42111  42112   2048    4    2 : tunables    0    0    0 : slabdata  10528  10528      0

Sorry for the false hope.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-08 13:19               ` linux
@ 2008-01-08 21:36                 ` Francois Romieu
  2008-01-08 23:00                   ` David Miller
  2008-01-09  0:38                   ` linux
  0 siblings, 2 replies; 20+ messages in thread
From: Francois Romieu @ 2008-01-08 21:36 UTC (permalink / raw)
  To: linux; +Cc: netdev, akpm, davem

linux@horizon.com <linux@horizon.com> :
> I take that back.  This patch does NOT fix the leak, at least if
> ping: sendmsg: No buffer space available
> is any indication...

Can you try the patch below ?

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index dbd23bb..c304e5c 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -860,7 +860,7 @@ static void ipg_nic_txfree(struct net_device *dev)
 	void __iomem *ioaddr = sp->ioaddr;
 	unsigned int curr;
 	u64 txd_map;
-	unsigned int released, pending;
+	unsigned int released, pending, dirty;
 
 	txd_map = (u64)sp->txd_map;
 	curr = ipg_r32(TFD_LIST_PTR_0) -
@@ -869,9 +869,9 @@ static void ipg_nic_txfree(struct net_device *dev)
 	IPG_DEBUG_MSG("_nic_txfree\n");
 
 	pending = sp->tx_current - sp->tx_dirty;
+	dirty = sp->tx_dirty % IPG_TFDLIST_LENGTH;
 
 	for (released = 0; released < pending; released++) {
-		unsigned int dirty = sp->tx_dirty % IPG_TFDLIST_LENGTH;
 		struct sk_buff *skb = sp->TxBuff[dirty];
 		struct ipg_tx *txfd = sp->txd + dirty;
 
@@ -898,6 +898,7 @@ static void ipg_nic_txfree(struct net_device *dev)
 
 			sp->TxBuff[dirty] = NULL;
 		}
+		dirty = (dirty + 1) % IPG_TFDLIST_LENGTH;
 	}
 
 	sp->tx_dirty += released;
-- 
1.5.3.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-08 21:36                 ` Francois Romieu
@ 2008-01-08 23:00                   ` David Miller
  2008-01-08 23:28                     ` Francois Romieu
  2008-01-09  0:38                   ` linux
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2008-01-08 23:00 UTC (permalink / raw)
  To: romieu; +Cc: linux, netdev, akpm

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Tue, 8 Jan 2008 22:36:40 +0100

> linux@horizon.com <linux@horizon.com> :
> > I take that back.  This patch does NOT fix the leak, at least if
> > ping: sendmsg: No buffer space available
> > is any indication...
> 
> Can you try the patch below ?

Same kind of bug as the RX side :-)  I bet this fixes his
problem...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-08 23:00                   ` David Miller
@ 2008-01-08 23:28                     ` Francois Romieu
  0 siblings, 0 replies; 20+ messages in thread
From: Francois Romieu @ 2008-01-08 23:28 UTC (permalink / raw)
  To: David Miller; +Cc: linux, netdev, akpm

David Miller <davem@davemloft.net> :
[...]
> Same kind of bug as the RX side :-)  I bet this fixes his
> problem...

I am not sure but the Rx side is probably just here to distract
from the real problem. Please don't ask... :o)

Anyway I'll poke an adapter in the test computer and give it a
try tomorrow. Nobody will complain if I crash it.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-08 21:36                 ` Francois Romieu
  2008-01-08 23:00                   ` David Miller
@ 2008-01-09  0:38                   ` linux
  2008-01-09  8:39                     ` David Miller
  2008-01-09 23:30                     ` Francois Romieu
  1 sibling, 2 replies; 20+ messages in thread
From: linux @ 2008-01-09  0:38 UTC (permalink / raw)
  To: linux, romieu; +Cc: akpm, davem, netdev

> Can you try the patch below ?

Testing now... (I presume you noticed the one-character typo in my
earlier patch.  That should be "mc = mc->next", not "mv = mc->next".)

That doesn't seem to do it.  Not entirely, at least.  After downloading
and partially re-uploading an 800M file, slabtop reports:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
341576 341574  99%    0.50K  42697        8    170788K kmalloc-512
342006 341953  99%    0.19K  16286       21     65144K kmalloc-192
 30592  30575  99%    2.00K   7648        4     61184K kmalloc-2048
 30213  30193  99%    0.44K   3357        9     13428K skbuff_fclone_cache
  7650   7643  99%    0.08K    150       51       600K sysfs_dir_cache
  4000   3938  98%    0.12K    125       32       500K kmalloc-128
   258    258 100%    1.15K     43        6       344K raid5-md5
   232    221  95%    1.00K     58        4       232K kmalloc-1024
  3136   3110  99%    0.06K     49       64       196K kmalloc-64
   264     80  30%    0.68K     24       11       192K ext3_inode_cache

The "kmalloc-2048" was down in the noise before the upload started.
This is in single-user mode, after sync and echo 3 > /proc/sys/vm/drop_caches.

I'll have to try this after this evening's social plans, but I'm thinking
of implementing more rapid bug detection: explicitly zero the sp->TxBuff
slot when the skb is freed, and check that it is zero before putting
anything else in there.  (And likewise for RxBuff.)

That way, I don't have to use up a noticeable amount of memory to see
the bug and reboot to clear up the damage each test cycle.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-09  0:38                   ` linux
@ 2008-01-09  8:39                     ` David Miller
  2008-01-09 23:34                       ` Francois Romieu
  2008-01-09 23:30                     ` Francois Romieu
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2008-01-09  8:39 UTC (permalink / raw)
  To: linux; +Cc: romieu, akpm, netdev

From: linux@horizon.com
Date: 8 Jan 2008 19:38:40 -0500

> That doesn't seem to do it.  Not entirely, at least.  After downloading
> and partially re-uploading an 800M file, slabtop reports:

Ok, I'll let you and Francois work out how to fix this for
good.

Please submit just the outright leak bug fixes once this is
all resolved.  All of that code cleanup stuff needs to wait
until later, let's fix bugs before adding new ones. :-)

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-09  0:38                   ` linux
  2008-01-09  8:39                     ` David Miller
@ 2008-01-09 23:30                     ` Francois Romieu
  2008-01-10  7:28                       ` ipg.c bugs linux
  1 sibling, 1 reply; 20+ messages in thread
From: Francois Romieu @ 2008-01-09 23:30 UTC (permalink / raw)
  To: linux; +Cc: akpm, davem, netdev

linux@horizon.com <linux@horizon.com> :
[...]
> That doesn't seem to do it.  Not entirely, at least.  After downloading
> and partially re-uploading an 800M file, slabtop reports:

Ok, enjoy this one. It is definitely better wrt the current problem.

More work tomorrow.

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index dbd23bb..42f300d 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -860,18 +860,18 @@ static void ipg_nic_txfree(struct net_device *dev)
 	void __iomem *ioaddr = sp->ioaddr;
 	unsigned int curr;
 	u64 txd_map;
-	unsigned int released, pending;
+	unsigned int released, pending, dirty;
 
 	txd_map = (u64)sp->txd_map;
 	curr = ipg_r32(TFD_LIST_PTR_0) -
 		do_div(txd_map, sizeof(struct ipg_tx)) - 1;
 
 	IPG_DEBUG_MSG("_nic_txfree\n");
 
 	pending = sp->tx_current - sp->tx_dirty;
+	dirty = sp->tx_dirty % IPG_TFDLIST_LENGTH;
 
 	for (released = 0; released < pending; released++) {
-		unsigned int dirty = sp->tx_dirty % IPG_TFDLIST_LENGTH;
 		struct sk_buff *skb = sp->TxBuff[dirty];
 		struct ipg_tx *txfd = sp->txd + dirty;
 
@@ -882,8 +884,11 @@ static void ipg_nic_txfree(struct net_device *dev)
 		 * If the TFDDone bit is set, free the associated
 		 * buffer.
 		 */
-		if (dirty == curr)
+		if (!(txfd->tfc & cpu_to_le64(IPG_TFC_TFDDONE))) {
+			printk(KERN_INFO "%s: released = %d pending = %d\n",
+				dev->name, released, pending);
 			break;
+		}
 
 		/* Setup TFDDONE for compatible issue. */
 		txfd->tfc |= cpu_to_le64(IPG_TFC_TFDDONE);
@@ -898,6 +903,7 @@ static void ipg_nic_txfree(struct net_device *dev)
 
 			sp->TxBuff[dirty] = NULL;
 		}
+		dirty = (dirty + 1) % IPG_TFDLIST_LENGTH;
 	}
 
 	sp->tx_dirty += released;
@@ -1943,10 +1948,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 */
 	if (sp->tenmbpsmode)
 		txfd->tfc |= cpu_to_le64(IPG_TFC_TXINDICATE);
-	else if (!((sp->tx_current - sp->tx_dirty + 1) >
-	    IPG_FRAMESBETWEENTXDMACOMPLETES)) {
-		txfd->tfc |= cpu_to_le64(IPG_TFC_TXDMAINDICATE);
-	}
+	txfd->tfc |= cpu_to_le64(IPG_TFC_TXDMAINDICATE);
 	/* Based on compilation option, determine if FCS is to be
 	 * appended to transmit frame by IPG.
 	 */
@@ -2003,7 +2005,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	ipg_w32(IPG_DC_TX_DMA_POLL_NOW, DMA_CTRL);
 
 	if (sp->tx_current == (sp->tx_dirty + IPG_TFDLIST_LENGTH))
-		netif_wake_queue(dev);
+		netif_stop_queue(dev);
 
 	spin_unlock_irqrestore(&sp->lock, flags);
 
-- 
Ueimor

Anybody got a battery for my Ultra 10 ?

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-09  8:39                     ` David Miller
@ 2008-01-09 23:34                       ` Francois Romieu
  2008-01-09 23:56                         ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Francois Romieu @ 2008-01-09 23:34 UTC (permalink / raw)
  To: David Miller; +Cc: linux, akpm, netdev

David Miller <davem@davemloft.net> :
[...]
> all resolved.  All of that code cleanup stuff needs to wait
> until later, let's fix bugs before adding new ones. :-)

Yes.

I should be able to test your r8169 NAPI changes tomorrow.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
  2008-01-09 23:34                       ` Francois Romieu
@ 2008-01-09 23:56                         ` David Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2008-01-09 23:56 UTC (permalink / raw)
  To: romieu; +Cc: linux, akpm, netdev

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 10 Jan 2008 00:34:58 +0100

> David Miller <davem@davemloft.net> :
> [...]
> > all resolved.  All of that code cleanup stuff needs to wait
> > until later, let's fix bugs before adding new ones. :-)
> 
> Yes.
> 
> I should be able to test your r8169 NAPI changes tomorrow.

Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* ipg.c bugs
  2008-01-09 23:30                     ` Francois Romieu
@ 2008-01-10  7:28                       ` linux
  0 siblings, 0 replies; 20+ messages in thread
From: linux @ 2008-01-10  7:28 UTC (permalink / raw)
  To: linux, romieu; +Cc: akpm, davem, netdev

I'm just about to test that second memory leak patch, but I gave the
original code a careful reading, and found a few problems:

* Huge monstrous glaring bug

  In ipg_interrupt_handler the code to habdle a shared interrupt
  not caused by this device:
	if (!(status & IPG_IS_RSVD_MASK))
		goto out_enable
  is *before* spin_lock(&sp->lock), but the code following
  out_enable does spin_unlock(&sp->lock).

  Thus, the sp->lock is all f*ed up.  The lack of any sort of
  locking between the interrupt handler and hard_start_xmit
  could cause all sort of issues.

  I'm not actually sure if it's even necessary; I'd think some
  suitable atomic access to sp->tx_current would suffice.

* Lesser bugs

  There's a general pattern of loops over the range from
  s->rx_current to sp->rx_dirty.  Some of them are call code
  that refers to s->rx_current, even though that hasn't been
  updated yet.

  One instance is in ipg_nic_check_frame_type.
  A second is in ipg_nic_check_error.

  In ipg_nic_set_multicast(), the code to enable the multicast flags
  is of the form "if (dev->flags & IFF_MULTICAST & (dev->mc_count > ...))".
  But IFF_MULTI CAST is not 1, so this will always be false.
  The seond & needs to be && (2x).

  In ipg_io_config(), there's
	/* Transmitter and receiver must be disabled before setting
	 * IFSSelect.
	 */
	ipg_w32((origmacctrl & (IPG_MC_RX_DISABLE | IPG_MC_TX_DISABLE)) &
		IPG_MC_RSVD_MASK, MAC_CTRL);
  I don't know what's going on there, but unless the IPG_MC_RX_DISABLE
  bit is already set in origmacctrl, that's going to write 0, which
  won't disable anything.

  Immediately following, there's some similarly buggy code doing something
  I don't understand with IPG_MC_IFS_96BIT.

  The setting of curr in ipg_nic_txfree, with that bizarre do_div, can't
  possibly be working right.

* Possible bugs

  I'm not very sanguine about the handling in init_rfdlist, of the
  code that handles a failed ipg_get_rxbuff.  In particular, it leaves
  rxfd->frag_info uninitialized in that case, but does set rxfd->rfs to
  "buffer ready to be received into", which could lead to receiving into
  random memory locations.

  In ipg_nic_hard_start_xmit(), the code
	if (sp->tx_current == (sp->tx_dirty + IPG_TFDLIST_LENGTH))
		netif_wake_queue(dev);
  shouldn't that *stop* the queue if the TFDLIST is full?

  I think that the places where the rxfd->rfs and txfd->tfc fields
  are filled in (containing the hardware-handoff flag) should
  have memory barriers.

* Stupid code

  In ipg_io_config, there are three writes to DEBUG_CTRL "Per silicon
  B3 eratta".  First, that's "errata".  But more significantly,
  can those writes be combined into one?  Is it necessary to read
  the DEBUG_CTRL register each time?

  The initialization of rxfd->rfs in init_rfdlist() and ipg_nix_rxrestore()
  should be moved into ipg_get_rxbuf().  And since the ready bit is there,
  it should be set AFTER the pointer fields AND there should be a barrier
  so the hardware doesn't read the fields out of order.

  In ipg_nic_txcleanup(), there's code to call netif_wake_queue every
  time through the loop in 10 MBit mode (to balance some bug-workaround
  call that stops the queue every packet in that case), which is
  quite unnecessary, as ipg_nic_txfree() will do it.

  The IPG_INSERT_MANUAL_VLAN_TAG code (fortunately disabled by default)
  is just plain bizarre.  What exactly is the use of assigning a tag of
  0xABC to every packet?

  The code in ipg_hw_init to set up dev->dev_addr reads each of the
  16-bit address reigsters twice, for no apparent reason.

  There's a lots of code in e.g. ipg_nic_rx() that does endless
  manipulation of rxfd->rfs with an le64_to_cpu() call around each
  instance, that should copy it to a CPU-ordered native value and be
  done with it.  (Some sparse annotations would help, too.)

  Likewise for messing with txfd->tfc in ipg_nic_hard_start_xmit().

  The Frame_WithEnd enum is a very strange value (decimal 10) to use as
  a bitmapped status flag.

  The four frame fragment functions
	nic_rx_with_start_and_end
	nic_rx_with_start
	nic_rx_with_end
	nic_rx_so_start_no_end
  could easily be unified into one.

* Performance left on the floor

  The hardware supports scallter/gather, hardware checksums, VLAN tagging,
  and 64-bit (well, 40-bit) DMA, but the driver sets no feature flags.

  The jumbo frame reception code could generate fragmented skbs rather
  that doing all those memcopies.

  Would it be worth splitting the 64-bit ->rfs and ->txc fields into
  two 32-bit fields?

  Would it be worth copying small incoming packets to small skbs and
  keeping the large skb in the receive queue?

* Questions

  In net_device_stats, are all those statistics registers cleared by
  a read?

  How do we determine the silicon revision numbers, so we can stop enabling
  bug workarounds on versions that don't need it?

  Where can I find docs about the scatter/gather features?  The bitfield
  definitions are a bit vague.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-01-10  7:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-28  2:06 2.6.23-rc8 network problem. Mem leak? ip1000a? linux
2007-09-28  9:20 ` Andrew Morton
2007-09-30  7:59   ` linux
2007-09-30  9:23     ` Andrew Morton
2007-09-30 11:40       ` linux
2008-01-08  6:52       ` linux
2008-01-08  7:07         ` David Miller
2008-01-08  7:14           ` David Miller
2008-01-08  7:51             ` Francois Romieu
2008-01-08 12:28             ` [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak linux
2008-01-08 13:19               ` linux
2008-01-08 21:36                 ` Francois Romieu
2008-01-08 23:00                   ` David Miller
2008-01-08 23:28                     ` Francois Romieu
2008-01-09  0:38                   ` linux
2008-01-09  8:39                     ` David Miller
2008-01-09 23:34                       ` Francois Romieu
2008-01-09 23:56                         ` David Miller
2008-01-09 23:30                     ` Francois Romieu
2008-01-10  7:28                       ` ipg.c bugs linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).