All of lore.kernel.org

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Add a "enable" sysfs attribute to the pci devices to allow userspace (Xorg) to enable devices without doing foul direct access
From: Bjorn Helgaas @ 2006-05-04 19:09 UTC (permalink / raw)
  To: linux-pci
  Cc: Dave Airlie, Arjan van de Ven, Andrew Morton, greg, linux-kernel,
	pjones
In-Reply-To: <Pine.LNX.4.64.0604291001490.2080@skynet.skynet.ie>

On Saturday 29 April 2006 03:04, Dave Airlie wrote:
> > This patch adds an "enable" sysfs attribute to each PCI device. When read it
> > shows the "enabled-ness" of the device, but you can write a "0" into it to
> > disable a device, and a "1" to enable it.
> >
> > This later is needed for X and other cases where userspace wants to enable
> > the BARs on a device (typical example: to run the video bios on a secundary
> > head). Right now X does all this "by hand" via bitbanging, that's just evil.
> > This allows X to no longer do that but to just let the kernel do this.

I'm all in favor of cleaning up X.  But making the X code prettier without
changing the underlying issues of claiming and sharing resources doesn't
help much.  In fact, I suspect the ultimate plan for X does not involve
an "enable" attribute in sysfs, so this may just introduce ABI cruft that
will be difficult to remove later.

> This would allow me to remove the issue in X where loading the DRM at X 
> startup acts differently than loading the DRM before X runs, due to Xs PCI 
> probe running in-between... with this I can just enable all VGA devices 
> and no worry whether they have a DRM or not..

This seems to be the main justification for the patch.  But I don't know
enough about X and DRM to understand it, or why this patch is the best
way to solve it.

I think Jon has a pretty convincing argument, which I *do* understand.
Can you expand on this justification?  Do you envision long-term usage
of the sysfs "enable" attribute?

Bjorn

^ permalink raw reply

* [PATCH] Re: a question on tcp_highspeed.c (in 2.6.16)
From: John Heffner @ 2006-05-04 19:06 UTC (permalink / raw)
  To: Xiaoliang (David) Wei; +Cc: netdev
In-Reply-To: <7335583a0605040703x1d6e8a20n515b22241795d3ab@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

Xiaoliang (David) Wei wrote:
> Hi gurus,
> 
>    I am reading the code of tcp_highspeed.c in the kernel and have a
> question on the hstcp_cong_avoid function, specifically the following
> AI part (line 136~143 in net/ipv4/tcp_highspeed.c ):
> 
>                /* Do additive increase */
>                if (tp->snd_cwnd < tp->snd_cwnd_clamp) {
>                        tp->snd_cwnd_cnt += ca->ai;
>                        if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
>                                tp->snd_cwnd++;
>                                tp->snd_cwnd_cnt -= tp->snd_cwnd;
>                        }
>                }
> 
>    In this part, when (tp->snd_cwnd_cnt == tp->snd_cwnd),
> snd_cwnd_cnt will be -1... snd_cwnd_cnt is defined as u16, will this
> small chance of getting -1 becomes a problem?
> Shall we change it by reversing the order of the cwnd++ and cwnd_cnt -= 
> cwnd?

Absolutely correct.  Thanks.

Signed-off-by: John Heffner <jheffner@psc.edu>

[-- Attachment #2: highspeed_cwnd_cnt.patch --]
[-- Type: text/plain, Size: 434 bytes --]

diff --git a/net/ipv4/tcp_highspeed.c b/net/ipv4/tcp_highspeed.c
index e0e9d13..b72fa55 100644
--- a/net/ipv4/tcp_highspeed.c
+++ b/net/ipv4/tcp_highspeed.c
@@ -137,8 +137,8 @@ static void hstcp_cong_avoid(struct sock
 		if (tp->snd_cwnd < tp->snd_cwnd_clamp) {
 			tp->snd_cwnd_cnt += ca->ai;
 			if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
-				tp->snd_cwnd++;
 				tp->snd_cwnd_cnt -= tp->snd_cwnd;
+				tp->snd_cwnd++;
 			}
 		}
 	}

^ permalink raw reply related

* Re: www.softpanorama.org: sparc_vs_x86 fun
From: Dmitry Torokhov @ 2006-05-04 18:59 UTC (permalink / raw)
  To: linux-kernel
In-Reply-To: <d8jejzavwxk.fsf@ritchie.ping.uio.no>

On 5/4/06, Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote:
> jimmy <jimmyb@huawei.com> writes:
>
> >> http://www.softpanorama.org/Articles/Linux_vs_Solaris/sparc_vs_x86.shtml
> >>
> >>> ...
> >>> Actually Linux used to be more heterogeneous in the past when it
> >>> supported Alpha. But those days are long gone.
> >>> ...
> > someone should send these guys a directory listing of linux/arch/
>
> I think he's confusing Red Hat with Linux. But these days even Red Hat
> supports more architectures than Solaris: i386, amd64, ia64, ppc64 and
> s390.
>

Don't pay too much attention:

"I have a subjective impression that networking in Linux is less
sophisticated..."

"As for LDAP quality I have no data but suspect that Solaris has an
upper hand..."

"Based on my limited knowledge of Linux kernel development it looks
like Linux development suffered from a classic case of premature
optimization disease..."

--
Dmitry

^ permalink raw reply

* Re: [RFC] kernel facilities for cache prefetching
From: Linda Walsh @ 2006-05-04 18:57 UTC (permalink / raw)
  To: Wu Fengguang, linux-kernel
In-Reply-To: <346744728.01465@ustc.edu.cn>

Wu Fengguang wrote:
> On Wed, May 03, 2006 at 02:45:53PM -0700, Linda Walsh wrote:
>   
>>    1. As you mention; reading files "sequentially" through the file
>> system is "bad" for several reasons.  Areas of interest:
>>    a) don't go through the file system.  Don't waste time doing
>> directory lookups and following file-allocation maps;  Instead,
>> use raw-disk i/o and read sectors in using device & block number.
>>     
>
> Sorry, it does not fit in the linux's cache model.
>   
---
    Maybe linux's cache model needs to be _improved_ to better
allow for hardware acceleration?^**  It is the job of the OS to provide
sufficiently low level facilities to allow optimal use of the system
hardware, while at the same time providing enough high level facilities
to support applications that don't require such tuning.


>>    b) Be "dynamic"; "Trace" (record (dev&blockno/range) blocks
>> starting ASAP after system boot and continuing for some "configurable"
>> number of seconds past reaching the desired "run-level" (coinciding with
>> initial disk quiescence).  Save as "configurable" (~6-8?) number of
>> traces to allow finding the common initial subset of blocks needed.
>>     
>
> It is a alternative way of doing the same job: more precise, with more
> complexity and more overhead.  However the 'blockno' way is not so
> tasteful.
>   
----
Maybe not so tasteful to you, but it is an alternate path that
circumvents unnecessary i/o redirections.  An additional optimization
is to have a "cache" of frequently used "system applications" that have
their pathnames registered, so no run-time seaching of the user PATH
is necessary.

>>    c) Allow specification of max# of blocks and max number of "sections"
>> (discontiguous areas on disk);
>>     
>
> Good point, will do it in my work.
>
>   
>>    d) "Ideally", would have a way to "defrag" the common set of blocks.
>> I.e. -- moving the needed blocks from potentially disparate areas of
>> files into 1 or 2 contiguous areas, hopefully near the beginning of
>> the disk (or partition(s)).
>>    That's the area of "boot" pre-caching.
>>     
>
>   
> I guess poor man's defrag would be good enough for the seeking storm.
>   
---
    I disagree. The "poor man's defrag, as you call it, puts entire files
into contiguous sections -- each of which will have to be referenced by 
following
a PATH and directory chain.  The optimization I'm talking about would 
only store
the referenced data-blocks actually used in the files.  This would allow a
directy read into memory of a *bunch* of needed sectors while not including
sectors from the same files that are not actually read from during the 
boot *or*
app. initialization process.

>>    That's "application" pre-caching.
>>     
> Yes, it is a planned feature, will do it.
>   
Trés cool.
>>    A third area -- that can't be easily done in the kernel, but would
>> require a higher skill level on the part of application and library
>> developers, is to move towards using "delay-loaded" libraries.  In
>> Windows, it seems common among system libraries to use this feature. 
>> An obvious benefit -- if certain features of a program are not used,
>> the associated libraries are never loaded.  Not loading unneeded parts
>> of a program should speed up initial application load time, significantly.
>>     
> Linux already does lazy loading for linked libs. The only one pitfall
> is that /lib/ld-linux.so.2 seems to touch the first 512B data of every
> libs before doing mmap():
>   
----
    Yes -- and this is the "killer".  If my app may "potentially" use
50 run-time libs, but in any single invocation, only uses 20 of those
libs, the page tables and fixups for the unused 30 libraries don't
need to be done.  In fact, in some configurations, those 30 libs may
not even need to be present on disk!

    Typical example - "Active Directory"; I don't use it.  I don't
need the libraries on my system or need them "resolved" at runtime. 
It would be far preferable if programs would only load those
libraries actually used at run-time -- and load them *dynamically*,
as needed (for libraries that may not actually be called or
used).  This way, the initial time to start the program is
significantly reduced to the "core" set of libraries needed to
run the program.  Optional features are loaded as those features
are called for off disk.  Delays of loading optional libraries
one-two at a time, interactively, are not likely to be noticed,
but if you load all of those "optional" libraries prior to execution,
the sum-total will be noticed in an interactive environment.

^** -- "somewhere", it _seems_, the physical, device relative sector
must be resolved.  If it is not, how is i/o-block buffer consistency
maintained when the user references "device "hda", sector "1000", then
the same sector as "hda1", sector 500, and also as file "/etc/passwd",
sector 0?  _If_ cache consistency is maintained (and I _assume_^**2
it is), they all need to be mapped to a physical sector at some point.

^**2 - Use of assumption noted; feel free to correct me and tell me
this isn't the case if linux doesn't maintain disk-block cache
consistency.


-linda





^ permalink raw reply

* Re: Setting up a multipath device
From: Christophe Varoqui @ 2006-05-04 18:55 UTC (permalink / raw)
  To: device-mapper development
In-Reply-To: <445A1D88.7030500@voltaire.com>


> 
>    1. Was it wrong to run 'multipath 2 round-robin 1 0 /dev/sda
>       round-robin 1 0 /dev/sdb'?

The multipath command just don't understand these parameters.
This is a map as seen by the DM kernel driver. You can feed it to
dmsetup(8), not to multipath(8).

>    2. If I don't supply any parameters, how does the dm know which
>       devices are the same physical device (that /dev/sda & /dev/sdb are
>       the same physical device)?

Because it scans the paths for unique LU ids, then coalesce.

>    3. How are the priority groups defined if no parameters are supplied?

User defines, through config file.

>    4. What is the timeout for a device that has failed? When does the dm
>       move to the next priority group?
> 
When secondary PG priority becomes higher than the active PG priority,
modulo user-define delay or user-define disabling.

Or when all paths in the active PG have failed.

Regards,
cvaroqui

^ permalink raw reply

* Re: Obtain original address from redirected connection
From: Pascal Hambourg @ 2006-05-04 18:51 UTC (permalink / raw)
  To: netfilter
In-Reply-To: <20060502025450.73457687.pedro.werneck@terra.com.br>

[Sorry for the late answer, I just subscribed to the list]

Hello,

Pedro Werneck wrote :
> 
> I have a daemon, a sort of proxy, written in Python, who receives
> redirected connections with a rule like this:
> 
> iptables -t nat -A PREROUTING -j DNAT -p TCP -s source --to-destination host:port
> 
> The problem is that I need access to the original destination address,

You can parse /proc/net/ip_conntrack on the NAT box, which contains the 
  list of the connections currently handled by conntrack/NAT. This is 
how Squid retrieves the original destination address when running in 
transparent mode.
Note : on "recent" kernels you need root privileges to read this file.

^ permalink raw reply

* [linux-lvm] Two Plorilant servers (Redhat) trying to access a HP MSA1000
From: Edgar Luna @ 2006-05-04 18:50 UTC (permalink / raw)
  To: linux-lvm

Hi everyone,

I have two servers with RedHat EL3 connected to an HP MSA1000. What I
want with this servers write/read the disks from the MSA1000.

I know that LVM is unaware of simultaneous access to one device, so I
tried to divide the storage of MSA1000 in two units (created via the ACU
application), that are seen by the RedHat servers as /dev/sda
and /dev/sdb. Then I created a Physical Volume for each one a Logical
Group and finally a Logical Volume for each of this.

I want an advice about this.
What would you do in my situation?
This works? I mean with this I can workaround the LVM 'problem' of
unaware of simultaneous access?
Is possible to access the Hard Drives in the MSA1000 directly from Linux
so I can make the LVM directly on disks?

I can't use GFS or that like because I can't afford to have only one
point of failure.

Thanks anyway.

-- 
Edgar Luna <eald@linuxuanl.org>
Linux UANL

^ permalink raw reply

* [updated] [Patch 6/8] delay accounting usage of taskstats interface
From: Balbir Singh @ 2006-05-04 18:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: lse-tech, jlan, akpm
In-Reply-To: <20060502061930.GC22607@in.ibm.com>


Changelog

Fixes suggested by Jay Lan
- check for tidstats before taking the mutex_lock in taskstats_exit_send()
- add back and fill version information for struct taskstats

Fixes comments by akpm (on earlier patch now incorporated here)
- detailed comments on atomicity rules of accounting fields
- replace use of nsec_t

delayacct-taskstats.patch

Usage of taskstats interface by delay accounting.

Signed-off-by: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
---

 include/linux/delayacct.h |   13 ++++++++++++
 include/linux/taskstats.h |   49 ++++++++++++++++++++++++++++++++++++++++++++++
 init/Kconfig              |    1 
 kernel/delayacct.c        |   42 +++++++++++++++++++++++++++++++++++++++
 kernel/taskstats.c        |   12 ++++++++++-
 5 files changed, 116 insertions(+), 1 deletion(-)

diff -puN include/linux/delayacct.h~delayacct-taskstats include/linux/delayacct.h
--- linux-2.6.17-rc3/include/linux/delayacct.h~delayacct-taskstats	2006-05-04 09:31:59.000000000 +0530
+++ linux-2.6.17-rc3-balbir/include/linux/delayacct.h	2006-05-04 11:26:18.000000000 +0530
@@ -18,6 +18,7 @@
 #define _LINUX_DELAYACCT_H
 
 #include <linux/sched.h>
+#include <linux/taskstats_kern.h>
 
 /*
  * Per-task flags relevant to delay accounting
@@ -35,6 +36,7 @@ extern void __delayacct_tsk_init(struct 
 extern void __delayacct_tsk_exit(struct task_struct *);
 extern void __delayacct_blkio_start(void);
 extern void __delayacct_blkio_end(void);
+extern int __delayacct_add_tsk(struct taskstats *, struct task_struct *);
 
 static inline void delayacct_set_flag(int flag)
 {
@@ -74,6 +76,14 @@ static inline void delayacct_blkio_end(v
 		__delayacct_blkio_end();
 }
 
+static inline int delayacct_add_tsk(struct taskstats *d,
+					struct task_struct *tsk)
+{
+	if (!tsk->delays)
+		return -EINVAL;
+	return __delayacct_add_tsk(d, tsk);
+}
+
 #else
 static inline void delayacct_set_flag(int flag)
 {}
@@ -89,6 +99,9 @@ static inline void delayacct_blkio_start
 {}
 static inline void delayacct_blkio_end(void)
 {}
+static inline int delayacct_add_tsk(struct taskstats *d,
+					struct task_struct *tsk)
+{ return 0; }
 #endif /* CONFIG_TASK_DELAY_ACCT */
 
 #endif
diff -puN include/linux/taskstats.h~delayacct-taskstats include/linux/taskstats.h
--- linux-2.6.17-rc3/include/linux/taskstats.h~delayacct-taskstats	2006-05-04 09:31:59.000000000 +0530
+++ linux-2.6.17-rc3-balbir/include/linux/taskstats.h	2006-05-04 09:34:11.000000000 +0530
@@ -35,6 +35,55 @@ struct taskstats {
 
 	/* Version 1 */
 	__u64	version;
+
+	/* Delay accounting fields start
+	 *
+	 * All values, until comment "Delay accounting fields end" are
+	 * available only if delay accounting is enabled, even though the last
+	 * few fields are not delays
+	 *
+	 * xxx_count is the number of delay values recorded
+	 * xxx_delay_total is the corresponding cumulative delay in nanoseconds
+	 *
+	 * xxx_delay_total wraps around to zero on overflow
+	 * xxx_count incremented regardless of overflow
+	 */
+
+	/* Delay waiting for cpu, while runnable
+	 * count, delay_total NOT updated atomically
+	 */
+	__u64	cpu_count;
+	__u64	cpu_delay_total;
+
+	/* Following four fields atomically updated using task->delays->lock */
+
+	/* Delay waiting for synchronous block I/O to complete
+	 * does not account for delays in I/O submission
+	 */
+	__u64	blkio_count;
+	__u64	blkio_delay_total;
+
+	/* Delay waiting for page fault I/O (swap in only) */
+	__u64	swapin_count;
+	__u64	swapin_delay_total;
+
+	/* cpu "wall-clock" running time
+	 * On some architectures, value will adjust for cpu time stolen
+	 * from the kernel in involuntary waits due to virtualization.
+	 * Value is cumulative, in nanoseconds, without a corresponding count
+	 * and wraps around to zero silently on overflow
+	 */
+	__u64	cpu_run_real_total;
+
+	/* cpu "virtual" running time
+	 * Uses time intervals seen by the kernel i.e. no adjustment
+	 * for kernel's involuntary waits due to virtualization.
+	 * Value is cumulative, in nanoseconds, without a corresponding count
+	 * and wraps around to zero silently on overflow
+	 */
+	__u64	cpu_run_virtual_total;
+	/* Delay accounting fields end */
+	/* version 1 ends here */
 };
 
 
diff -puN init/Kconfig~delayacct-taskstats init/Kconfig
--- linux-2.6.17-rc3/init/Kconfig~delayacct-taskstats	2006-05-04 09:31:59.000000000 +0530
+++ linux-2.6.17-rc3-balbir/init/Kconfig	2006-05-04 09:31:59.000000000 +0530
@@ -164,6 +164,7 @@ config TASKSTATS
 
 config TASK_DELAY_ACCT
 	bool "Enable per-task delay accounting (EXPERIMENTAL)"
+	depends on TASKSTATS
 	help
 	  Collect information on time spent by a task waiting for system
 	  resources like cpu, synchronous block I/O completion and swapping
diff -puN kernel/delayacct.c~delayacct-taskstats kernel/delayacct.c
--- linux-2.6.17-rc3/kernel/delayacct.c~delayacct-taskstats	2006-05-04 09:31:59.000000000 +0530
+++ linux-2.6.17-rc3-balbir/kernel/delayacct.c	2006-05-04 11:26:18.000000000 +0530
@@ -104,3 +104,45 @@ void __delayacct_blkio_end(void)
 			&current->delays->blkio_delay,
 			&current->delays->blkio_count);
 }
+
+int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
+{
+	s64 tmp;
+	struct timespec ts;
+	unsigned long t1,t2,t3;
+
+	tmp = (s64)d->cpu_run_real_total;
+	tmp += (u64)(tsk->utime + tsk->stime) * TICK_NSEC;
+	d->cpu_run_real_total = (tmp < (s64)d->cpu_run_real_total) ? 0 : tmp;
+
+	/*
+	 * No locking available for sched_info (and too expensive to add one)
+	 * Mitigate by taking snapshot of values
+	 */
+	t1 = tsk->sched_info.pcnt;
+	t2 = tsk->sched_info.run_delay;
+	t3 = tsk->sched_info.cpu_time;
+
+	d->cpu_count += t1;
+
+	jiffies_to_timespec(t2, &ts);
+	tmp = (s64)d->cpu_delay_total + timespec_to_ns(&ts);
+	d->cpu_delay_total = (tmp < (s64)d->cpu_delay_total) ? 0 : tmp;
+
+	tmp = (s64)d->cpu_run_virtual_total + (s64)jiffies_to_usecs(t3) * 1000;
+	d->cpu_run_virtual_total =
+		(tmp < (s64)d->cpu_run_virtual_total) ?	0 : tmp;
+
+	/* zero XXX_total, non-zero XXX_count implies XXX stat overflowed */
+
+	spin_lock(&tsk->delays->lock);
+	tmp = d->blkio_delay_total + tsk->delays->blkio_delay;
+	d->blkio_delay_total = (tmp < d->blkio_delay_total) ? 0 : tmp;
+	tmp = d->swapin_delay_total + tsk->delays->swapin_delay;
+	d->swapin_delay_total = (tmp < d->swapin_delay_total) ? 0 : tmp;
+	d->blkio_count += tsk->delays->blkio_count;
+	d->swapin_count += tsk->delays->swapin_count;
+	spin_unlock(&tsk->delays->lock);
+
+	return 0;
+}
diff -puN kernel/taskstats.c~delayacct-taskstats kernel/taskstats.c
--- linux-2.6.17-rc3/kernel/taskstats.c~delayacct-taskstats	2006-05-04 09:31:59.000000000 +0530
+++ linux-2.6.17-rc3-balbir/kernel/taskstats.c	2006-05-04 11:27:53.000000000 +0530
@@ -18,6 +18,7 @@
 
 #include <linux/kernel.h>
 #include <linux/taskstats_kern.h>
+#include <linux/delayacct.h>
 #include <net/genetlink.h>
 #include <asm/atomic.h>
 
@@ -120,7 +121,10 @@ static int fill_pid(pid_t pid, struct ta
 	 *		goto err;
 	 */
 
-err:
+	rc = delayacct_add_tsk(stats, tsk);
+	stats->version = TASKSTATS_VERSION;
+
+	/* Define err: label here if needed */
 	put_task_struct(tsk);
 	return rc;
 
@@ -152,8 +156,14 @@ static int fill_tgid(pid_t tgid, struct 
 		 *		break;
 		 */
 
+		rc = delayacct_add_tsk(stats, tsk);
+		if (rc)
+			break;
+
 	} while_each_thread(first, tsk);
 	read_unlock(&tasklist_lock);
+	stats->version = TASKSTATS_VERSION;
+
 
 	/*
 	 * Accounting subsytems can also add calls here if they don't
_

^ permalink raw reply

* megaraid_mbox: garbage in file
From: Vasily Averin @ 2006-05-04 18:48 UTC (permalink / raw)
  To: linux-scsi, Neela.Kolli, Atul Mukker, Seokmann.Ju, sreenib,
	James.Bottomley, devel

Hello all,

I've investigated customers claim on the unstable work of their node and found a
strange effect: reading from some files leads to the
 "attempt to access beyond end of device" messages.

I've checked filesystem, memory on the node, motherboard BIOS version, but it
does not help and issue still has been reproduced by simple file reading.

Reproducer is simple:

echo 0xffffffff >/proc/sys/dev/scsi/logging_level ;
cat /vz/private/101/root/etc/ld.so.cache >/tmp/ttt  ;
echo 0 >/proc/sys/dev/scsi/logging

It leads to the following messages in dmesg

sd_init_command: disk=sda, block=871769260, count=26
sda : block=871769260
sda : reading 26/26 512 byte blocks.
scsi_add_timer: scmd: f79ed980, time: 7500, (c02b1420)
sd 0:1:0:0: send 0xf79ed980                  sd 0:1:0:0:
        command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00
buffer = 0xf7cfb540, bufflen = 13312, done = 0xc0366b40, queuecommand 0xc0344010
leaving scsi_dispatch_cmnd()
scsi_delete_timer: scmd: f79ed980, rtn: 1
sd 0:1:0:0: done 0xf79ed980 SUCCESS        0 sd 0:1:0:0:
        command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00
scsi host busy 1 failed 0
sd 0:1:0:0: Notifying upper driver of completion (result 0)
sd_rw_intr: sda: res=0x0
26 sectors total, 13312 bytes done.
use_sg is 4
attempt to access beyond end of device
sda6: rw=0, want=1044134458, limit=951401367
Buffer I/O error on device sda6, logical block 522067228
attempt to access beyond end of device
sda6: rw=0, want=1178878530, limit=951401367
Buffer I/O error on device sda6, logical block 589439264
...

As far as I see first read operation has finished without errors, but when we
read the rest of file we get an access to beyond end of device.

Originally it was found on Virtuozzo kernels (2.6.8.1-based x86 32-bit),
reproduced on RHEL4 kernels 2.6.9-22.EL and 2.6.9-34.EL,
on FC5 (2.6.16-1.2096_FC5) and on vanilla 2.6.16 kernels.

However, when I first read these blocks by using dd with bs=512 or 1024 it works
without any troubles. Then I can cat this file, copy it, map it and so on -- and
get correct content without any errors. Moreover, this issue may be workarounded
by memory limitation: it helps to use mem=4G in kernel commandline or kernels
without PAE support.

I've noticed that we attempt to access to the blocks with a strange numbers:

522067228 = 0x1f1e1d1c
589439264 = 0x23222120 and so on.

Then I've found that I've read strange garbage from file:

# hexdump /tmp/ttt
0000000 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e
0000010 1110 1312 1514 1716 1918 1b1a 1d1c 1f1e
0000020 2120 2322 2524 2726 2928 2b2a 2d2c 2f2e
0000030 3130 3332 3534 3736 3938 3b3a 3d3c 3f3e
0000040 4140 4342 4544 4746 4948 4b4a 4d4c 4f4e
0000050 5150 5352 5554 5756 5958 5b5a 5d5c 5f5e
0000060 6160 6362 6564 6766 6968 6b6a 6d6c 6f6e
0000070 7170 7372 7574 7776 7978 7b7a 7d7c 7f7e
0000080 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e
0000090 1110 1312 1514 1716 1918 1b1a 1d1c 1f1e
00000a0 2120 2322 2524 2726 2928 2b2a 2d2c 2f2e
...
00000f0 7170 7372 7574 7776 7978 7b7a 7d7c 7f7e
0000100 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e
...

Then I've discovered that "access beyond end of device" occurs due reading of
the same garbage from the 13-th (Indirect) block of the file.

I've tried to understand where we got this garbage and found that it is present
in the data buffers beginning at megaraid_mbox driver functions.

Could somebody explain me what is the strange garbage: repeated 0...127?
Seokmann, Atul, could you please tell me if it is a known issue?
James, from my point of view it is not looks like a driver bug, but probably I'm
wrong?

I suppose it is MegaRAID SATA 150-4 firmware issue. I've seen similar firmware
fixes for MegaRAID SATA 300 controllers ("Support PAE mode fixed" and "Fixed the
operating systems using more than 4 gig of memory"). Is it probably the same
issues are present in SATA 150-4 firmware? Or may be I use broken controller?

Hardware Environment:
Tyan B2881
2 x Opteron 246
8G RAM
LSI MegaRAID SATA 150-4
/vz partition formatted as ext3 with 1Kb blocksize

megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
megaraid: 2.20.4.7 (Release Date: Mon Nov 14 12:27:22 EST 2005)
megaraid: probe new device 0x1000:0x1960:0x1000:0x4523: bus 1:slot 4:func 0
ACPI: PCI Interrupt 0000:01:04.0[A] -> GSI 29 (level, low) -> IRQ 16
megaraid: fw version:[713N] bios version:[G119]
scsi0 : LSI Logic MegaRAID driver
scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
scsi[0]: scanning scsi channel 1 [virtual] for logical drives
  Vendor: MegaRAID  Model: LD 0 RAID1  476G  Rev: 713N
  Type:   Direct-Access                      ANSI SCSI revision: 02

Also I would note that from my point of view this issue looks similar to
http://bugzilla.kernel.org/show_bug.cgi?id=6052

It seems for me both of our cases may have the same cause.

Thank you,
	Vasily Averin

SWsoft Virtuozzo/OpenVZ Linux kernel team

^ permalink raw reply

* [PATCH] initramfs: fix CPIO hardlink check
From: Mark Huang @ 2006-05-04 18:43 UTC (permalink / raw)
  To: linux-kernel

Copy the filenames of hardlinks when inserting them into the hash, since
the "name" pointer may point to scratch space (name_buf). Not doing so
results in corruption if the scratch space is later overwritten: the 
wrong file may be hardlinked, or, if the scratch space contains garbage, 
the link will fail and a 0-byte file will be created instead.

Cc: me on responses.

Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu>

--- linux-2.6.16.13/init/initramfs.c	2006-05-02 17:38:44.000000000 -0400
+++ linux-2.6.16.13.initramfs/init/initramfs.c	2006-05-04 
14:26:44.000000000 -0400
@@ -26,10 +26,12 @@ static void __init free(void *where)

  /* link hash */

+#define N_ALIGN(len) ((((len) + 1) & ~3) + 2)
+
  static __initdata struct hash {
  	int ino, minor, major;
  	struct hash *next;
-	char *name;
+	char name[N_ALIGN(PATH_MAX)];
  } *head[32];

  static inline int hash(int major, int minor, int ino)
@@ -57,7 +59,7 @@ static char __init *find_link(int major,
  	q->ino = ino;
  	q->minor = minor;
  	q->major = major;
-	q->name = name;
+	strcpy(q->name, name);
  	q->next = NULL;
  	*p = q;
  	return NULL;
@@ -133,8 +135,6 @@ static inline void eat(unsigned n)
  	count -= n;
  }

-#define N_ALIGN(len) ((((len) + 1) & ~3) + 2)
-
  static __initdata char *collected;
  static __initdata int remains;
  static __initdata char *collect;


^ permalink raw reply

* [updated] [Patch 5/8] taskstats interface
From: Balbir Singh @ 2006-05-04 18:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: lse-tech, jlan, akpm
In-Reply-To: <20060502061829.GB22607@in.ibm.com>


Changelog

Fixes comments by jlan@engr.sgi.com
- separate out taskstats interface from delay accounting completely including
separate documentation
- permit different accounting subsystems to fill in parts of common
structure separately before common taskstats code sends it out on genetlink
- send common structure to userspace after update_hiwater_rss and before
exit_mm in do_exit
- move version field right upfront and avoid the filler member in taskstats

Fixes comments by akpm
- comment to indicate locking used for taskstats struct
- whitespace issues
- unnecessary use of constant taskstats_version
- uninline fill_pid(), fill_tgid()
- unnecessary cast to pid_t in taskstats_send_stats()
- too early evaluation of thread_group_empty() in taskstats_exit_pid
- returning -EFAULT on genl_register_family failure in taskstats_init
- comment for late_initcall of taskstats_init

No fix needed
- moving kmem_cache_free of tsk->delays outside the exit mutex
  (mutex shifted and tsk->delays freeing being done elsewhere now)	
- __delayacct_add_tsk returning -EINVAL if delay accounting isn't enabled
	user should know that no values can be returned
	returning zero would be misleading
- combining fill_pid(), fill_tgid() into a common function
	combined code convoluted and less readable

taskstats-setup.patch

Create a "taskstats" interface based on generic netlink
(NETLINK_GENERIC family), for getting statistics of
tasks and thread groups during their lifetime and when they exit.
The interface is intended for use by multiple accounting packages
though it is being created in the context of delay accounting.

This patch creates the interface without populating the
fields of the data that is sent to the user in response to a command
or upon the exit of a task. Each accounting package interested in using
taskstats has to provide an additional patch to add its stats to the
common structure.

Signed-off-by: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
---

 Documentation/accounting/taskstats.txt |  146 ++++++++++++++
 include/linux/taskstats.h              |   84 ++++++++
 include/linux/taskstats_kern.h         |   57 +++++
 init/Kconfig                           |   12 +
 init/main.c                            |    2 
 kernel/Makefile                        |    1 
 kernel/exit.c                          |    7 
 kernel/taskstats.c                     |  329 +++++++++++++++++++++++++++++++++
 8 files changed, 638 insertions(+)

diff -puN /dev/null Documentation/accounting/taskstats.txt
--- /dev/null	2004-06-24 23:34:38.000000000 +0530
+++ linux-2.6.17-rc3-balbir/Documentation/accounting/taskstats.txt	2006-05-04 09:30:50.000000000 +0530
@@ -0,0 +1,146 @@
+Per-task statistics interface
+-----------------------------
+
+
+Taskstats is a netlink-based interface for sending per-task and
+per-process statistics from the kernel to userspace.
+
+Taskstats was designed for the following benefits:
+
+- efficiently provide statistics during lifetime of a task and on its exit
+- unified interface for multiple accounting subsystems
+- extensibility for use by future accounting patches
+
+Terminology
+-----------
+
+"pid", "tid" and "task" are used interchangeably and refer to the standard
+Linux task defined by struct task_struct.  per-pid stats are the same as
+per-task stats.
+
+"tgid", "process" and "thread group" are used interchangeably and refer to the
+tasks that share an mm_struct i.e. the traditional Unix process. Despite the
+use of tgid, there is no special treatment for the task that is thread group
+leader - a process is deemed alive as long as it has any task belonging to it.
+
+Usage
+-----
+
+To get statistics during task's lifetime, userspace opens a unicast netlink
+socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
+The response contains statistics for a task (if pid is specified) or the sum of
+statistics for all tasks of the process (if tgid is specified).
+
+To obtain statistics for tasks which are exiting, userspace opens a multicast
+netlink socket. Each time a task exits, two records are sent by the kernel to
+each listener on the multicast socket. The first the per-pid task's statistics
+and the second is the sum for all tasks of the process to which the task
+belongs (the task does not need to be the thread group leader). The need for
+per-tgid stats to be sent for each exiting task is explained in the per-tgid
+stats section below.
+
+
+Interface
+---------
+
+The user-kernel interface is encapsulated in include/linux/taskstats.h
+
+To avoid this documentation becoming obsolete as the interface evolves, only
+an outline of the current version is given. taskstats.h always overrides the
+description here.
+
+struct taskstats is the common accounting structure for both per-pid and
+per-tgid data. It is versioned and can be extended by each accounting subsystem
+that is added to the kernel. The fields and their semantics are defined in the
+taskstats.h file.
+
+The data exchanged between user and kernel space is a netlink message belonging
+to the NETLINK_GENERIC family and using the netlink attributes interface.
+The messages are in the format
+
+    +----------+- - -+-------------+-------------------+
+    | nlmsghdr | Pad |  genlmsghdr | taskstats payload |
+    +----------+- - -+-------------+-------------------+
+
+
+The taskstats payload is one of the following three kinds:
+
+1. Commands: Sent from user to kernel. The payload is one attribute, of type
+TASKSTATS_CMD_ATTR_PID/TGID, containing a u32 pid or tgid in the attribute
+payload. The pid/tgid denotes the task/process for which userspace wants
+statistics.
+
+2. Response for a command: sent from the kernel in response to a userspace
+command. The payload is a series of three attributes of type:
+
+a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
+a pid/tgid will be followed by some stats.
+
+b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
+is being returned.
+
+c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The
+same structure is used for both per-pid and per-tgid stats.
+
+3. New message sent by kernel whenever a task exits. The payload consists of a
+   series of attributes of the following type:
+
+a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
+b) TASKSTATS_TYPE_PID: contains exiting task's pid
+c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
+d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
+e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
+f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
+
+
+per-tgid stats
+--------------
+
+Taskstats provides per-process stats, in addition to per-task stats, since
+resource management is often done at a process granularity and aggregating task
+stats in userspace alone is inefficient and potentially inaccurate (due to lack
+of atomicity).
+
+However, maintaining per-process, in addition to per-task stats, within the
+kernel has space and time overheads. Hence the taskstats implementation
+dynamically sums up the per-task stats for each task belonging to a process
+whenever per-process stats are needed.
+
+Not maintaining per-tgid stats creates a problem when userspace is interested
+in getting these stats when the process dies i.e. the last thread of
+a process exits. It isn't possible to simply return some aggregated per-process
+statistic from the kernel.
+
+The approach taken by taskstats is to return the per-tgid stats *each* time
+a task exits, in addition to the per-pid stats for that task. Userspace can
+maintain task<->process mappings and use them to maintain the per-process stats
+in userspace, updating the aggregate appropriately as the tasks of a process
+exit.
+
+Extending taskstats
+-------------------
+
+There are two ways to extend the taskstats interface to export more
+per-task/process stats as patches to collect them get added to the kernel
+in future:
+
+1. Adding more fields to the end of the existing struct taskstats. Backward
+   compatibility is ensured by the version number within the
+   structure. Userspace will use only the fields of the struct that correspond
+   to the version its using.
+
+2. Defining separate statistic structs and using the netlink attributes
+   interface to return them. Since userspace processes each netlink attribute
+   independently, it can always ignore attributes whose type it does not
+   understand (because it is using an older version of the interface).
+
+
+Choosing between 1. and 2. is a matter of trading off flexibility and
+overhead. If only a few fields need to be added, then 1. is the preferable
+path since the kernel and userspace don't need to incur the overhead of
+processing new netlink attributes. But if the new fields expand the existing
+struct too much, requiring disparate userspace accounting utilities to
+unnecessarily receive large structures whose fields are of no interest, then
+extending the attributes structure would be worthwhile.
+
+----
diff -puN /dev/null include/linux/taskstats.h
--- /dev/null	2004-06-24 23:34:38.000000000 +0530
+++ linux-2.6.17-rc3-balbir/include/linux/taskstats.h	2006-05-04 09:31:52.000000000 +0530
@@ -0,0 +1,84 @@
+/* taskstats.h - exporting per-task statistics
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2006
+ *           (C) Balbir Singh,   IBM Corp. 2006
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+
+#ifndef _LINUX_TASKSTATS_H
+#define _LINUX_TASKSTATS_H
+
+/* Format for per-task data returned to userland when
+ *	- a task exits
+ *	- listener requests stats for a task
+ *
+ * The struct is versioned. Newer versions should only add fields to
+ * the bottom of the struct to maintain backward compatibility.
+ *
+ *
+ * To add new fields
+ *	a) bump up TASKSTATS_VERSION
+ *	b) add comment indicating new version number at end of struct
+ *	c) add new fields after version comment; maintain 64-bit alignment
+ */
+
+#define TASKSTATS_VERSION	1
+
+struct taskstats {
+
+	/* Version 1 */
+	__u64	version;
+};
+
+
+#define TASKSTATS_LISTEN_GROUP	0x1
+
+/*
+ * Commands sent from userspace
+ * Not versioned. New commands should only be inserted at the enum's end
+ * prior to __TASKSTATS_CMD_MAX
+ */
+
+enum {
+	TASKSTATS_CMD_UNSPEC = 0,	/* Reserved */
+	TASKSTATS_CMD_GET,		/* user->kernel request/get-response */
+	TASKSTATS_CMD_NEW,		/* kernel->user event */
+	__TASKSTATS_CMD_MAX,
+};
+
+#define TASKSTATS_CMD_MAX (__TASKSTATS_CMD_MAX - 1)
+
+enum {
+	TASKSTATS_TYPE_UNSPEC = 0,	/* Reserved */
+	TASKSTATS_TYPE_PID,		/* Process id */
+	TASKSTATS_TYPE_TGID,		/* Thread group id */
+	TASKSTATS_TYPE_STATS,		/* taskstats structure */
+	TASKSTATS_TYPE_AGGR_PID,	/* contains pid + stats */
+	TASKSTATS_TYPE_AGGR_TGID,	/* contains tgid + stats */
+	__TASKSTATS_TYPE_MAX,
+};
+
+#define TASKSTATS_TYPE_MAX (__TASKSTATS_TYPE_MAX - 1)
+
+enum {
+	TASKSTATS_CMD_ATTR_UNSPEC = 0,
+	TASKSTATS_CMD_ATTR_PID,
+	TASKSTATS_CMD_ATTR_TGID,
+	__TASKSTATS_CMD_ATTR_MAX,
+};
+
+#define TASKSTATS_CMD_ATTR_MAX (__TASKSTATS_CMD_ATTR_MAX - 1)
+
+/* NETLINK_GENERIC related info */
+
+#define TASKSTATS_GENL_NAME	"TASKSTATS"
+#define TASKSTATS_GENL_VERSION	0x1
+
+#endif /* _LINUX_TASKSTATS_H */
diff -puN /dev/null include/linux/taskstats_kern.h
--- /dev/null	2004-06-24 23:34:38.000000000 +0530
+++ linux-2.6.17-rc3-balbir/include/linux/taskstats_kern.h	2006-05-02 09:47:24.000000000 +0530
@@ -0,0 +1,57 @@
+/* taskstats_kern.h - kernel header for per-task statistics interface
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2006
+ *           (C) Balbir Singh,   IBM Corp. 2006
+ */
+
+#ifndef _LINUX_TASKSTATS_KERN_H
+#define _LINUX_TASKSTATS_KERN_H
+
+#include <linux/taskstats.h>
+#include <linux/sched.h>
+
+enum {
+	TASKSTATS_MSG_UNICAST,		/* send data only to requester */
+	TASKSTATS_MSG_MULTICAST,	/* send data to a group */
+};
+
+#ifdef CONFIG_TASKSTATS
+extern kmem_cache_t *taskstats_cache;
+
+static inline void taskstats_exit_alloc(struct taskstats **ptidstats,
+					struct taskstats **ptgidstats)
+{
+	*ptidstats = kmem_cache_zalloc(taskstats_cache, SLAB_KERNEL);
+	*ptgidstats = kmem_cache_zalloc(taskstats_cache, SLAB_KERNEL);
+}
+
+static inline void taskstats_exit_free(struct taskstats *tidstats,
+					struct taskstats *tgidstats)
+{
+	if (tidstats)
+		kmem_cache_free(taskstats_cache, tidstats);
+	if (tgidstats)
+		kmem_cache_free(taskstats_cache, tgidstats);
+}
+
+extern void taskstats_exit_send(struct task_struct *, struct taskstats *,
+				struct taskstats *);
+extern void taskstats_init_early(void);
+
+#else
+static inline void taskstats_exit_alloc(struct taskstats **ptidstats,
+					struct taskstats **ptgidstats)
+{}
+static inline void taskstats_exit_free(struct taskstats *ptidstats,
+					struct taskstats *ptgidstats)
+{}
+static inline void taskstats_exit_send(struct task_struct *tsk,
+					struct taskstats *tidstats,
+					struct taskstats *tgidstats)
+{}
+static inline void taskstats_init_early(void)
+{}
+#endif /* CONFIG_TASKSTATS */
+
+#endif
+
diff -puN init/Kconfig~taskstats-setup init/Kconfig
--- linux-2.6.17-rc3/init/Kconfig~taskstats-setup	2006-05-02 09:47:24.000000000 +0530
+++ linux-2.6.17-rc3-balbir/init/Kconfig	2006-05-04 09:31:24.000000000 +0530
@@ -150,6 +150,18 @@ config BSD_PROCESS_ACCT_V3
 	  for processing it. A preliminary version of these tools is available
 	  at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>.
 
+config TASKSTATS
+	bool "Export task/process statistics through netlink (EXPERIMENTAL)"
+	default n
+	help
+	  Export selected statistics for tasks/processes through the
+	  generic netlink interface. Unlike BSD process accounting, the
+	  statistics are available during the lifetime of tasks/processes as
+	  responses to commands. Like BSD accounting, they are sent to user
+	  space on task exit.
+
+	  Say N if unsure.
+
 config TASK_DELAY_ACCT
 	bool "Enable per-task delay accounting (EXPERIMENTAL)"
 	help
diff -puN init/main.c~taskstats-setup init/main.c
--- linux-2.6.17-rc3/init/main.c~taskstats-setup	2006-05-02 09:47:24.000000000 +0530
+++ linux-2.6.17-rc3-balbir/init/main.c	2006-05-02 09:47:24.000000000 +0530
@@ -47,6 +47,7 @@
 #include <linux/rmap.h>
 #include <linux/mempolicy.h>
 #include <linux/key.h>
+#include <linux/taskstats_kern.h>
 #include <linux/delayacct.h>
 
 #include <asm/io.h>
@@ -542,6 +543,7 @@ asmlinkage void __init start_kernel(void
 	proc_root_init();
 #endif
 	cpuset_init();
+	taskstats_init_early();
 	delayacct_init();
 
 	check_bugs();
diff -puN kernel/exit.c~taskstats-setup kernel/exit.c
--- linux-2.6.17-rc3/kernel/exit.c~taskstats-setup	2006-05-02 09:47:24.000000000 +0530
+++ linux-2.6.17-rc3-balbir/kernel/exit.c	2006-05-02 09:47:24.000000000 +0530
@@ -36,6 +36,7 @@
 #include <linux/compat.h>
 #include <linux/pipe_fs_i.h>
 #include <linux/delayacct.h>
+#include <linux/taskstats_kern.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -848,6 +849,7 @@ static void exit_notify(struct task_stru
 fastcall NORET_TYPE void do_exit(long code)
 {
 	struct task_struct *tsk = current;
+	struct taskstats *tidstats, *tgidstats;
 	int group_dead;
 
 	profile_task_exit(tsk);
@@ -894,6 +896,8 @@ fastcall NORET_TYPE void do_exit(long co
 				current->comm, current->pid,
 				preempt_count());
 
+	taskstats_exit_alloc(&tidstats, &tgidstats);
+
 	acct_update_integrals(tsk);
 	if (tsk->mm) {
 		update_hiwater_rss(tsk->mm);
@@ -911,7 +915,10 @@ fastcall NORET_TYPE void do_exit(long co
 	if (unlikely(tsk->compat_robust_list))
 		compat_exit_robust_list(tsk);
 #endif
+	taskstats_exit_send(tsk, tidstats, tgidstats);
+	taskstats_exit_free(tidstats, tgidstats);
 	delayacct_tsk_exit(tsk);
+
 	exit_mm(tsk);
 
 	exit_sem(tsk);
diff -puN kernel/Makefile~taskstats-setup kernel/Makefile
--- linux-2.6.17-rc3/kernel/Makefile~taskstats-setup	2006-05-02 09:47:24.000000000 +0530
+++ linux-2.6.17-rc3-balbir/kernel/Makefile	2006-05-02 09:47:24.000000000 +0530
@@ -39,6 +39,7 @@ obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
+obj-$(CONFIG_TASKSTATS) += taskstats.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -puN /dev/null kernel/taskstats.c
--- /dev/null	2004-06-24 23:34:38.000000000 +0530
+++ linux-2.6.17-rc3-balbir/kernel/taskstats.c	2006-05-04 09:31:24.000000000 +0530
@@ -0,0 +1,329 @@
+/*
+ * taskstats.c - Export per-task statistics to userland
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2006
+ *           (C) Balbir Singh,   IBM Corp. 2006
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/taskstats_kern.h>
+#include <net/genetlink.h>
+#include <asm/atomic.h>
+
+static DEFINE_PER_CPU(__u32, taskstats_seqnum) = { 0 };
+static int family_registered = 0;
+kmem_cache_t *taskstats_cache;
+static DEFINE_MUTEX(taskstats_exit_mutex);
+
+static struct genl_family family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= TASKSTATS_GENL_NAME,
+	.version	= TASKSTATS_GENL_VERSION,
+	.maxattr	= TASKSTATS_CMD_ATTR_MAX,
+};
+
+static struct nla_policy taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1]
+__read_mostly = {
+	[TASKSTATS_CMD_ATTR_PID]  = { .type = NLA_U32 },
+	[TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_U32 },
+};
+
+
+static int prepare_reply(struct genl_info *info, u8 cmd, struct sk_buff **skbp,
+			void **replyp, size_t size)
+{
+	struct sk_buff *skb;
+	void *reply;
+
+	/*
+	 * If new attributes are added, please revisit this allocation
+	 */
+	skb = nlmsg_new(size);
+	if (!skb)
+		return -ENOMEM;
+
+	if (!info) {
+		int seq = get_cpu_var(taskstats_seqnum)++;
+		put_cpu_var(taskstats_seqnum);
+
+		reply = genlmsg_put(skb, 0, seq,
+				family.id, 0, 0,
+				cmd, family.version);
+	} else
+		reply = genlmsg_put(skb, info->snd_pid, info->snd_seq,
+				family.id, 0, 0,
+				cmd, family.version);
+	if (reply == NULL) {
+		nlmsg_free(skb);
+		return -EINVAL;
+	}
+
+	*skbp = skb;
+	*replyp = reply;
+	return 0;
+}
+
+static int send_reply(struct sk_buff *skb, pid_t pid, int event)
+{
+	struct genlmsghdr *genlhdr = nlmsg_data((struct nlmsghdr *)skb->data);
+	void *reply;
+	int rc;
+
+	reply = genlmsg_data(genlhdr);
+
+	rc = genlmsg_end(skb, reply);
+	if (rc < 0) {
+		nlmsg_free(skb);
+		return rc;
+	}
+
+	if (event == TASKSTATS_MSG_MULTICAST)
+		return genlmsg_multicast(skb, pid, TASKSTATS_LISTEN_GROUP);
+	return genlmsg_unicast(skb, pid);
+}
+
+static int fill_pid(pid_t pid, struct task_struct *pidtsk,
+		struct taskstats *stats)
+{
+	int rc;
+	struct task_struct *tsk = pidtsk;
+
+	if (!pidtsk) {
+		read_lock(&tasklist_lock);
+		tsk = find_task_by_pid(pid);
+		if (!tsk) {
+			read_unlock(&tasklist_lock);
+			return -ESRCH;
+		}
+		get_task_struct(tsk);
+		read_unlock(&tasklist_lock);
+	} else
+		get_task_struct(tsk);
+
+	/*
+	 * Each accounting subsystem adds calls to its functions to
+	 * fill in relevant parts of struct taskstsats as follows
+	 *
+	 *	rc = per-task-foo(stats, tsk);
+	 *	if (rc)
+	 *		goto err;
+	 */
+
+err:
+	put_task_struct(tsk);
+	return rc;
+
+}
+
+static int fill_tgid(pid_t tgid, struct task_struct *tgidtsk,
+		struct taskstats *stats)
+{
+	int rc;
+	struct task_struct *tsk, *first;
+
+	first = tgidtsk;
+	read_lock(&tasklist_lock);
+	if (!first) {
+		first = find_task_by_pid(tgid);
+		if (!first) {
+			read_unlock(&tasklist_lock);
+			return -ESRCH;
+		}
+	}
+	tsk = first;
+	do {
+		/*
+		 * Each accounting subsystem adds calls its functions to
+		 * fill in relevant parts of struct taskstsats as follows
+		 *
+		 *	rc = per-task-foo(stats, tsk);
+		 *	if (rc)
+		 *		break;
+		 */
+
+	} while_each_thread(first, tsk);
+	read_unlock(&tasklist_lock);
+
+	/*
+	 * Accounting subsytems can also add calls here if they don't
+	 * wish to aggregate statistics for per-tgid stats
+	 */
+
+	return rc;
+}
+
+static int taskstats_send_stats(struct sk_buff *skb, struct genl_info *info)
+{
+	int rc = 0;
+	struct sk_buff *rep_skb;
+	struct taskstats stats;
+	void *reply;
+	size_t size;
+	struct nlattr *na;
+
+	/*
+	 * Size includes space for nested attributes
+	 */
+	size = nla_total_size(sizeof(u32)) +
+		nla_total_size(sizeof(struct taskstats)) + nla_total_size(0);
+
+	memset(&stats, 0, sizeof(stats));
+	rc = prepare_reply(info, TASKSTATS_CMD_NEW, &rep_skb, &reply, size);
+	if (rc < 0)
+		return rc;
+
+	if (info->attrs[TASKSTATS_CMD_ATTR_PID]) {
+		u32 pid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_PID]);
+		rc = fill_pid(pid, NULL, &stats);
+		if (rc < 0)
+			goto err;
+
+		na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_PID);
+		NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_PID, pid);
+		NLA_PUT_TYPE(rep_skb, struct taskstats, TASKSTATS_TYPE_STATS,
+				stats);
+	} else if (info->attrs[TASKSTATS_CMD_ATTR_TGID]) {
+		u32 tgid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_TGID]);
+		rc = fill_tgid(tgid, NULL, &stats);
+		if (rc < 0)
+			goto err;
+
+		na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_TGID);
+		NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_TGID, tgid);
+		NLA_PUT_TYPE(rep_skb, struct taskstats, TASKSTATS_TYPE_STATS,
+				stats);
+	} else {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	nla_nest_end(rep_skb, na);
+
+	return send_reply(rep_skb, info->snd_pid, TASKSTATS_MSG_UNICAST);
+
+nla_put_failure:
+	return genlmsg_cancel(rep_skb, reply);
+err:
+	nlmsg_free(rep_skb);
+	return rc;
+}
+
+/* Send pid data out on exit */
+void taskstats_exit_send(struct task_struct *tsk, struct taskstats *tidstats,
+			struct taskstats *tgidstats)
+{
+	int rc;
+	struct sk_buff *rep_skb;
+	void *reply;
+	size_t size;
+	int is_thread_group;
+	struct nlattr *na;
+
+	if (!family_registered || !tidstats)
+		return;
+
+	mutex_lock(&taskstats_exit_mutex);
+
+	is_thread_group = !thread_group_empty(tsk);
+	rc = 0;
+
+	/*
+	 * Size includes space for nested attributes
+	 */
+	size = nla_total_size(sizeof(u32)) +
+		nla_total_size(sizeof(struct taskstats)) + nla_total_size(0);
+
+	if (is_thread_group)
+		size = 2 * size;	/* PID + STATS + TGID + STATS */
+
+	rc = prepare_reply(NULL, TASKSTATS_CMD_NEW, &rep_skb, &reply, size);
+	if (rc < 0)
+		goto ret;
+
+	rc = fill_pid(tsk->pid, tsk, tidstats);
+	if (rc < 0)
+		goto err_skb;
+
+	na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_PID);
+	NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_PID, (u32)tsk->pid);
+	NLA_PUT_TYPE(rep_skb, struct taskstats, TASKSTATS_TYPE_STATS,
+			*tidstats);
+	nla_nest_end(rep_skb, na);
+
+	if (!is_thread_group || !tgidstats) {
+		send_reply(rep_skb, 0, TASKSTATS_MSG_MULTICAST);
+		goto ret;
+	}
+
+	rc = fill_tgid(tsk->pid, tsk, tgidstats);
+	if (rc < 0)
+		goto err_skb;
+
+	na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_TGID);
+	NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_TGID, (u32)tsk->tgid);
+	NLA_PUT_TYPE(rep_skb, struct taskstats, TASKSTATS_TYPE_STATS,
+			*tgidstats);
+	nla_nest_end(rep_skb, na);
+
+	send_reply(rep_skb, 0, TASKSTATS_MSG_MULTICAST);
+	goto ret;
+
+nla_put_failure:
+	genlmsg_cancel(rep_skb, reply);
+	goto ret;
+err_skb:
+	nlmsg_free(rep_skb);
+ret:
+	mutex_unlock(&taskstats_exit_mutex);
+	return;
+}
+
+static struct genl_ops taskstats_ops = {
+	.cmd		= TASKSTATS_CMD_GET,
+	.doit		= taskstats_send_stats,
+	.policy		= taskstats_cmd_get_policy,
+};
+
+/* Needed early in initialization */
+void __init taskstats_init_early(void)
+{
+	taskstats_cache = kmem_cache_create("taskstats_cache",
+						sizeof(struct taskstats),
+						0, SLAB_PANIC, NULL, NULL);
+}
+
+static int __init taskstats_init(void)
+{
+	int rc;
+
+	rc = genl_register_family(&family);
+	if (rc)
+		return rc;
+	family_registered = 1;
+
+	if ((rc = genl_register_ops(&family, &taskstats_ops)) < 0)
+		goto err;
+
+	return 0;
+err:
+	genl_unregister_family(&family);
+	family_registered = 0;
+	return rc;
+}
+
+/*
+ * late initcall ensures initialization of statistics collection
+ * mechanisms precedes initialization of the taskstats interface
+ */
+late_initcall(taskstats_init);
_

^ permalink raw reply

* Re: TCP/IP send, sendfile, RAW
From: Roy Rietveld @ 2006-05-04 18:42 UTC (permalink / raw)
  To: linux-os; +Cc: linux-kernel, jengelh
In-Reply-To: <Pine.LNX.4.61.0605041424380.7013@chaos.analogic.com>

i tried but it doesn't help, still 40MBits. Does send or sento cost a lot of 
cpu load.

i tried to measure the cpu time sendto cost.

gettimeofday(start)
sendto
gettimeofday(end)

print end - start

time measured is 250 us.


>From: "linux-os (Dick Johnson)" <linux-os@analogic.com>
>Reply-To: "linux-os (Dick Johnson)" <linux-os@analogic.com>
>To: "Roy Rietveld" <rwm_rietveld@hotmail.com>
>CC: <linux-kernel@vger.kernel.org>,<jengelh@linux01.gwdg.de>
>Subject: Re: TCP/IP send, sendfile, RAW
>Date: Thu, 4 May 2006 14:27:47 -0400
>
>
>On Thu, 4 May 2006, Roy Rietveld wrote:
>
> > Yes it is 100 MBits and there is a listener. and there are no other pc's 
>on
> > the link because its cross cable link. And when sending large buffers
> > 32Kbyte it will do 80 MBits. It think that there is a lot of overhead in 
>the
> > fucntion send or something.
> >
>
>Use sendto() and recvfrom() for UDP. Stream protocols require an ACK and
>are slower.
>
> >
> >> From: "linux-os (Dick Johnson)" <linux-os@analogic.com>
> >> Reply-To: "linux-os (Dick Johnson)" <linux-os@analogic.com>
> >> To: "Jan Engelhardt" <jengelh@linux01.gwdg.de>
> >> CC: "Roy Rietveld"
> >> <rwm_rietveld@hotmail.com>,<linux-kernel@vger.kernel.org>
> >> Subject: Re: TCP/IP send, sendfile, RAW
> >> Date: Thu, 4 May 2006 13:56:31 -0400
> >>
> >>
> >> On Thu, 4 May 2006, Jan Engelhardt wrote:
> >>
> >>>> I would like to send ethernet packets with 1400 bytes payload.
> >>>> I wrote a small program witch sends a buffer of 1400 bytes in a 
>endless
> >> loop.
> >>>> The problem is that a would like 100Mbits throughtput but when i 
>check
> >> this
> >>>> with ethereal.
> >>>> I only get 40 MBits. I tried sending with an UDP socket and RAW 
>socket.
> >> I also
> >>>> tried sendfile.
> >>>> The RAW socket gives the best result till now 50 MBits throughtput.
> >>>
> >>> Limitation of Ethernet.
> >>>
> >>>
> >>>
> >>> Jan Engelhardt
> >>
> >> Maybe he can tell what he means by 100 MBits! If he is looking for
> >> 100 megabits per second, that's easy, That's 100/8 = 12.5 megabytes
> >> per second. Anything, including Windows on a wet string, will
> >> do that. If he is looking for 100 megabytes per second, that's
> >> hard. He would need 100 * 8 = 800 megabits/second. A "gigabit" link
> >> runs that fast if nobody else is on it, but there is a header and CRC
> >> tail, in addition to the payload. UDP is the protocol to use to realize
> >> this kind of bandwidth, but its possible for some packets to get lost 
>and,
> >> if they are routed, they could even be duplicated. Also, when testing
> >> UDP, there must be a listener in order to realize the high speed.
> >> You can't just spew out a dead-end link.
> >>
> >> Cheers,
> >> Dick Johnson
> >> Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
> >> New book: http://www.lymanschool.com
> >> _
> >> \x1a\x04
> >>
> >> ****************************************************************
> >> The information transmitted in this message is confidential and may be
> >> privileged.  Any review, retransmission, dissemination, or other use of
> >> this information by persons or entities other than the intended 
>recipient
> >> is prohibited.  If you are not the intended recipient, please notify
> >> Analogic Corporation immediately - by replying to this message or by
> >> sending an email to DeliveryErrors@analogic.com - and destroy all 
>copies of
> >> this information, including any attachments, without reading or 
>disclosing
> >> them.
> >>
> >> Thank you.
> >> -
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" 
>in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> >
> >
> >
>
>Cheers,
>Dick Johnson
>Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
>New book: http://www.lymanschool.com
>_
>\x1a\x04
>
>****************************************************************
>The information transmitted in this message is confidential and may be 
>privileged.  Any review, retransmission, dissemination, or other use of 
>this information by persons or entities other than the intended recipient 
>is prohibited.  If you are not the intended recipient, please notify 
>Analogic Corporation immediately - by replying to this message or by 
>sending an email to DeliveryErrors@analogic.com - and destroy all copies of 
>this information, including any attachments, without reading or disclosing 
>them.
>
>Thank you.



^ permalink raw reply

* Job Vacancy in New Zealand
From: Frank @ 2006-05-04 18:39 UTC (permalink / raw)
  To: linux-kernel

Vacancy description:
Open position: Key Account Manager (part-time position: 10-12 hours per week)
Location: New Zealand
Job Duties and Responsibilities: Key Account Manager must conduct financial operations in accordance with our company's rules, regulations and sales policy to provide best conditions to companyвЂ™s clients. These activities include, but are not limited to, simple financial operations and actions which do not require any special education or experience. But we require ability to: work with data, to use PC; interpersonal and people skills; multi-tasking ability, organizational skills.
We provide salary plus benefits and travel packages discounts.
Sent your messages of interest to job@bestworldcountries.com
--
Regards,
W&C Team
job@bestworldcountries.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: Remove silly messages from input layer.
From: Dave Jones @ 2006-05-04 18:38 UTC (permalink / raw)
  To: dtor_core; +Cc: Martin J. Bligh, Pavel Machek, Linux Kernel
In-Reply-To: <d120d5000605041134k3d9f5934ne9e01f7108cb0271@mail.gmail.com>

On Thu, May 04, 2006 at 02:34:34PM -0400, Dmitry Torokhov wrote:

 > >Perhaps it should say that then ;-)
 > 
 > Do you have a beter wording in mind? "Keyboard reports too many keys
 > were pessed at once, some keystrokes might be dropped"?

It still doesn't make sense when the user only pressed a single key,
or in some cases, never pressed *any* key (don't have that report to hand,
but it was a laptop keyboard)

 > Also I don't understand what people have against this message, it's at
 > KERN_DEBUG level after all.

When you're on the recieving end of distro kernel bug reports, it becomes clearer :)
Users read dmesg from time to time, and freak out when they see something
like this that looks like an error that they can't do anything about.
Until I silenced these in the Fedora kernel I was getting quite a few reports
from concerned users.

		Dave
-- 
http://www.codemonkey.org.uk

^ permalink raw reply

* Re: CFI Extended (Intel P30) problems on an ARM PXA255
From: Dan Merillat @ 2006-05-04 18:37 UTC (permalink / raw)
  To: linux-mtd
In-Reply-To: <Pine.LNX.4.64.0605040909570.28543@localhost.localdomain>

On 5/4/06, Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 4 May 2006, Dan Merillat wrote:
>
> > > > Write error in obliterating obsoleted node at 0x00102318: -30
> > >
> > > -30 is -EROFS.
> > >
> > > Did you unlock the flash sectors before mounting JFFS2?
> >
> > I wish it were that simple.
>
> Still, did you unlock the flash sectors?

YES, I unlocked the flash sectors.  Both partiions, all regions.  And
if the WP# line is being dropped/raised it would re-protect all the
sectors, which it's not.  I can run the commandline flash_unlock and
flash_erase (which, of course, trash everything and I have to re-flash
afterwards.)

Also, if the sectors are locked, then it's still a bug to stay in
status register mode.  A read-only flash is a valid configuration, and
should result in a read-only filesystem, not an unusable filesystem.

I verified again and again, I don't know why I got a EROFS and I wish
to god I hadn't pasted it.  No other run has gotten me this result
unless I deliberatly lock the sectors before starting linux.  I just
verified with a very slow erase all, re-upload jffs2 filesystem over
115200 serial, re-flash, crc32 verify the contents, and fill the
remaining flash with 0x0.   U-boot has no problems performing any of
this, so the flash is indeed unlocked.  Then I started linux and ended
up with the same deal, after a write whatever the first read happens
to be returns 0x0080 and gets all sorts of wacky corruption errors.

^ permalink raw reply

* Re: [PATCH] Make interrupt handler works for all cases
From: Franck Bui-Huu @ 2006-05-04 18:34 UTC (permalink / raw)
  To: Thiemo Seufer; +Cc: Ralf Baechle, linux-mips
In-Reply-To: <20060502180436.GH5004@networkno.de>

2006/5/2, Thiemo Seufer <ths@networkno.de>:
> Franck Bui-Huu wrote:
> > 2006/5/2, Thiemo Seufer <ths@networkno.de>:
> > >Franck Bui-Huu wrote:
> > >> 2006/5/2, Ralf Baechle <ralf@linux-mips.org>:
> > >> >On Tue, May 02, 2006 at 09:55:51AM +0200, Franck Bui-Huu wrote:
> > >> >
> > >> >> specially when the kernel is mapped.
> > >> >
> > >> >At which time you're on very fragile ice because TLB instructions should
> > >> >better be executed from an unmapped address ...
> > >> >
> > >>
> > >> well TLB entry used by the kernel is wired, so it should work fined,
> > >> shouldn't it ?
> > >
> > >The architecture spec doesn't guarantee it will.
> >
> > having a quick look at the TLB handling code, it seems that the code
> > assumes it will...
>
> I don't know which code you are looking at, but the kernel's TLB
> handling doesn't run in mapped space. (The ip27 is an exception,
> I assume the R1x000 allows for mapped TLB handling.)
>

I have assumed the same for r4k cpu and it seems to work fine...
Anyways, Ralf can you apply this patch ?

Thanks
--
               Franck

^ permalink raw reply

* Re: Remove silly messages from input layer.
From: Dmitry Torokhov @ 2006-05-04 18:34 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Pavel Machek, Dave Jones, Linux Kernel
In-Reply-To: <445A18D8.1030502@mbligh.org>

On 5/4/06, Martin J. Bligh <mbligh@mbligh.org> wrote:
> Pavel Machek wrote:
> > On Wed 03-05-06 22:44:04, Dave Jones wrote:
> >
> >>There are two messages in the input layer that seem to be
> >>triggerable very easily, and they confuse end-users to no end.
> >>"too many keys pressed? Should I press less keys?"
> >
> >
> > It actually means 'type more slowly' or 'use standard keymap' or 'get
> > a better keyboard' :-) or 'no, you are not imagining it, I've seen
> > your keypress and dropped it'.
>
> Perhaps it should say that then ;-)
>

Do you have a beter wording in mind? "Keyboard reports too many keys
were pessed at once, some keystrokes might be dropped"?

Also I don't understand what people have against this message, it's at
KERN_DEBUG level after all.

--
Dmitry

^ permalink raw reply

* [Qemu-devel] kqemu-1.3.0-pre6 and qemu-0.8.1
From: Ishwar Rattan @ 2006-05-04 18:37 UTC (permalink / raw)
  To: Qemu-devel

Syatme is Debian derivative, kernel version 2.6.15.1.
compiled k1qemu with gcc-4.0, compiles and insmods fine
compiled qemu with gcc-3.4, compiles, installs and runs
fine without -kernel-kqemu option

kernel panic with -kernel-kqemu option (guest slax,
host linux).

Is this a known issue?
-ishwar

^ permalink raw reply

* RE: [PATCH][SVM][1/2] fix SVM 64bit hv cores>0 reboot/hang issue
From: Woller, Thomas @ 2006-05-04 18:30 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

> 
> Zeroing them out is probably the best thing to do. Xen 
> doesn't bother to change them while it executes because 
> ds/es/ss are effectively unused in 64-bit mode. The fact they 
> happen to contain __USER_DS is utterly unimportant.
Good to hear that...

> 
> You probably need to worry about fs and gs as well. They're 
> not causing you problems now because guests don't usually set 
> those registers to non-zero values, but if they ever do then 
> you'll presumably have problems?
Just to be clear here, the host registers are what is giving us 
problems. When the microcode loads the host selectors in the vmexit
microcode logic, then the consistency check fails on ds,es (0x2B)
due to GDT entries not in memory at that moment. 

I think that we'll go with not modifying gs,fs now - they are currently
NULL in the host context.

> I'll wait for further advice before applying this patch.
So, if you would, please apply the previous patch, exactly as is.

Also, if you could please apply to the 3.0-testing tree, that would
help also.  

Thanks,
Tom

^ permalink raw reply

* Re: TCP/IP send, sendfile, RAW
From: linux-os (Dick Johnson) @ 2006-05-04 18:27 UTC (permalink / raw)
  To: Roy Rietveld; +Cc: linux-kernel, jengelh
In-Reply-To: <BAY105-F393958BBFE20D29F8C6C82E9B40@phx.gbl>


On Thu, 4 May 2006, Roy Rietveld wrote:

> Yes it is 100 MBits and there is a listener. and there are no other pc's on
> the link because its cross cable link. And when sending large buffers
> 32Kbyte it will do 80 MBits. It think that there is a lot of overhead in the
> fucntion send or something.
>

Use sendto() and recvfrom() for UDP. Stream protocols require an ACK and
are slower.

>
>> From: "linux-os (Dick Johnson)" <linux-os@analogic.com>
>> Reply-To: "linux-os (Dick Johnson)" <linux-os@analogic.com>
>> To: "Jan Engelhardt" <jengelh@linux01.gwdg.de>
>> CC: "Roy Rietveld"
>> <rwm_rietveld@hotmail.com>,<linux-kernel@vger.kernel.org>
>> Subject: Re: TCP/IP send, sendfile, RAW
>> Date: Thu, 4 May 2006 13:56:31 -0400
>>
>>
>> On Thu, 4 May 2006, Jan Engelhardt wrote:
>>
>>>> I would like to send ethernet packets with 1400 bytes payload.
>>>> I wrote a small program witch sends a buffer of 1400 bytes in a endless
>> loop.
>>>> The problem is that a would like 100Mbits throughtput but when i check
>> this
>>>> with ethereal.
>>>> I only get 40 MBits. I tried sending with an UDP socket and RAW socket.
>> I also
>>>> tried sendfile.
>>>> The RAW socket gives the best result till now 50 MBits throughtput.
>>>
>>> Limitation of Ethernet.
>>>
>>>
>>>
>>> Jan Engelhardt
>>
>> Maybe he can tell what he means by 100 MBits! If he is looking for
>> 100 megabits per second, that's easy, That's 100/8 = 12.5 megabytes
>> per second. Anything, including Windows on a wet string, will
>> do that. If he is looking for 100 megabytes per second, that's
>> hard. He would need 100 * 8 = 800 megabits/second. A "gigabit" link
>> runs that fast if nobody else is on it, but there is a header and CRC
>> tail, in addition to the payload. UDP is the protocol to use to realize
>> this kind of bandwidth, but its possible for some packets to get lost and,
>> if they are routed, they could even be duplicated. Also, when testing
>> UDP, there must be a listener in order to realize the high speed.
>> You can't just spew out a dead-end link.
>>
>> Cheers,
>> Dick Johnson
>> Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
>> New book: http://www.lymanschool.com
>> _
>> \x1a\x04
>>
>> ****************************************************************
>> The information transmitted in this message is confidential and may be
>> privileged.  Any review, retransmission, dissemination, or other use of
>> this information by persons or entities other than the intended recipient
>> is prohibited.  If you are not the intended recipient, please notify
>> Analogic Corporation immediately - by replying to this message or by
>> sending an email to DeliveryErrors@analogic.com - and destroy all copies of
>> this information, including any attachments, without reading or disclosing
>> them.
>>
>> Thank you.
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.lymanschool.com
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply

* [U-Boot-Users] PXA27x USB support?
From: Marco Cavallini @ 2006-05-04 18:27 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <917E1033-A808-46E3-878F-27752A2632F5@cse.unsw.edu.au>

David Snowdon ha scritto:
> G'Day,
> 
> Has anyone given PXA27X USB support a go? If not, I'm probably going to 
> need to have a crack. Anyone else have a need for it? Any sage words of 
> advice from someone who's looked at USB for another platform?

In my PXA270 Mainstone U-Boot port, it has not been required,
neither implemented, but I never needed USB in bootloader.

-- 
Marco Cavallini
Koan s.a.s. - Bergamo - ITALIA
Embedded and Real-Time Software Engineering
www.koansoftware.com    |    www.klinux.org

^ permalink raw reply

* Re: [Qemu-devel] Unknown PCI Bridge
From: Fabrice Bellard @ 2006-05-04 18:17 UTC (permalink / raw)
  To: qemu-devel
In-Reply-To: <445A0480.7010001@cnpbagwell.com>

Chris Bagwell wrote:
> Hi all,
> 
> I upgraded to current CVS (0.8.1 plus a couple of patches like acpi).  
> Last time I upgraded was about 1 week ago.
> 
> When I ran this version with a win98 guest, windows detected a new 
> device called "PCI Bridge".  It was unable to find a driver for this on 
> the win98 CD and placed it as not working in the "other devices" section 
> of device manager.
> 
> Doesn't seem to harm anything.  I was guessing it had something to do 
> with the acpi patches but haven't verified.  Any ideas?

Yes, ACPI adds a new PCI device so it is normal. I am interested by any 
regression found using the current CVS with ACPI...

Fabrice.

^ permalink raw reply

* Re: Moving from 2.4 to 2.6 kernel
From: Grant Likely @ 2006-05-04 18:11 UTC (permalink / raw)
  To: Chris Dumoulin; +Cc: linuxppc-embedded
In-Reply-To: <445A3AB1.7040405@ics-ltd.com>

On 5/4/06, Chris Dumoulin <cdumoulin@ics-ltd.com> wrote:
> I am trying to take a working embedded linux system from kernel 2.4 to
> 2.6. The hardware is a custom board using a Virtex II Pro with PPC405
> processor.
> The working system uses u-boot 1.1.1 with linux kernel 2.4.18.
>
> I am using the same u-boot and I am trying to port linux kernel 2.6.15
> to our platform. Using the stuff from the working 2.4 kernel port and
> other PPC4xx platforms in the 2.6.15 code as examples, I believe I have
> done everything required to get the kernel booting. However, the kernel
> seems to hang after being uncompressed. Here is what I see after running
> bootm from u-boot:

The V2pro is already supported in the 2.6 kernel.  Are you using that
code; or starting from scratch?  V2Pro (ML300) and V4FX (ML403) are
both supported in the latest 2.6 kernel.

I had similar problems when I was porting to the V4FX.  Do you have a
JTAG debugger (Like a BDI-2000)?  I think your best bet is to get in
there with a debugger to find out where it goes off into Lala land.

Cheers,
g.

--
Grant Likely, B.Sc. P.Eng.
Secret Lab Technologies Ltd.
(403) 399-0195

^ permalink raw reply

* [lm-sensors] i2c bindings in python?
From: Mark M. Hoffman @ 2006-05-04 18:11 UTC (permalink / raw)
  To: lm-sensors
In-Reply-To: <200605021625.31660.alexander.krause@erazor-zone.de>

Hi Alexander:

* Alexander Krause <alexander.krause at erazor-zone.de> [2006-05-04 18:54:25 +0200]:
> > http://members.dca.net/mhoffman/sensors/python/20050122/
> i installed it via `python setup.py install` (it compiled the .c with a lot of 
> warnings) but I'm getting 'undefined symbol: i2c_smbus_access' when using the 
> module.
> 
> I suppose its cos of the linux-2.6 headers.
> 
> Is there a easy way to get it working?
> 
> btw, i'm using python-2.4 but that shouldn't matter.

Yes, yes, and yes.

Get the lm_sensors2 package and do 'make user' and (as root)
'make user_install'.  That will put the proper header file
(for access to i2c-dev devices) in place.

Regards,

-- 
Mark M. Hoffman
mhoffman at lightlink.com

^ permalink raw reply

* Re: limits / PIPE_BUF?
From: Vadim Lobanov @ 2006-05-04 18:10 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel
In-Reply-To: <1146765273.3101.68.camel@laptopd505.fenrus.org>

On Thu, 4 May 2006, Arjan van de Ven wrote:

> On Thu, 2006-05-04 at 10:50 -0700, Vadim Lobanov wrote:
> > On Thu, 4 May 2006, Arjan van de Ven wrote:
> >
> > > On Thu, 2006-05-04 at 09:39 -0700, Vadim Lobanov wrote:
> > > > How does the kernel
> > > > code ensure that this value is honored, considering that PIPE_BUF is
> > > > not
> > > > referenced in any of the pipe code?
> > >
> > >
> > > the kernel implementation guarantees one page basically, and on all
> > > architectures that I know of that's at least 4096 bytes
> > >
> >
> > Alright, so sounds like this constant should remain inside the
> > include/linux/limits.h file. What about #defining it to be equal to
> > PAGE_SIZE, like ARM (include/linux-arm/limits.h, for example) does?
>
> there is a certain elegance in providing the same value on all
> architectures; it means apps don't suddenly break if you port it to
> a "lesser" one. Also there is a problem with PAGE_SIZE itself, that's a
> config option on several architectures, so it'd have to be a define for
> get_page_size() or something, at which point you change semantics since
> apps can't do
>
> char foo[PIPE_BUF];
>
> anymore
>

Good point.

I suppose this means that the limits.h header file is used, either
directly or indirectly, by both us (the kernel) and by user-space? If
so, blech! What's the policy on keeping around config values that don't
do anything inside the kernel anymore? The short list is:
 NR_OPEN
 ARG_MAX
 CHILD_MAX
 OPEN_MAX
 LINK_MAX
 MAX_CANON
 MAX_INPUT
 RTSIG_MAX

- Vadim Lobanov

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.