All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [coLinux-devel] coLinux benchmarks
From: Dan Aloni @ 2004-04-06 14:07 UTC (permalink / raw)
  To: Ian C. Blenke
  Cc: Eyal Lotem, Cooperative Linux Development, Linux Kernel List
In-Reply-To: <20040406134549.GA28681@fresh-install>

On Tue, Apr 06, 2004 at 09:45:49AM -0400, Ian C. Blenke wrote:
> On Tue, Apr 06, 2004 at 12:22:56AM +0200, Dan Aloni wrote:
> > On Mon, Apr 05, 2004 at 01:11:39PM -0700, Eyal Lotem wrote:
> > 
> > > I think the reason may be that Windows is using the
> > > disks better and making access faster. Perhaps DMA
> > > acceleration or some other feature is turned off on
> > > the Linux host side, making disk access slower on the
> > > Linux side.
> > 
> > No Windows was involved with these benchmarks in any way. I ran 
> > coLinux on Linux.
> 
> You ran coLinux on a Linux host? Perhaps I've missed something on the list..
> is there a native Linux kernel port now? An alternative to User Mode Linux
> is a rather big thing for me.

Yes, it's an alternative to User Mode Linux, thought it's a bit
early and doesn't have all the wide range of support tools and 
nifty stuff that UML has.

-- 
Dan Aloni
da-x@colinux.org

^ permalink raw reply

* [U-Boot-Users] EZkit bf533 board
From: ganapathi @ 2004-04-06 14:06 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <DLEALGDPKDCCDOIIPLNFCEHPCEAA.vidya_s@lgsoftindia.com>

For all EZkit BF-531/2/3/5 the EZ-LAN is mapped with Async Bank3.
The BASE_ADDRESS for EZ-LAN card access in the EZkit - BF533 is 0x20300000

Regards
Ganapathi C

----- Original Message ----- 
From: "Vidya S" <vidya_s@lgsoftindia.com>
To: "U-Boot Users" <u-boot-users@lists.sourceforge.net>
Sent: Tuesday, April 06, 2004 6:07 PM
Subject: [U-Boot-Users] EZkit bf533 board


> Hi,
>
>   Anybody using EZ-KIT lite BF533 board? Can you pls let me know the
memory
> map for Ethernet interface? Any document to mention the same. for eg, in
> BF535 ez-kit board, i think the BASE_ADDRESS for ethernet is 0x2c000300.
>
> regards
> vidya
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> U-Boot-Users mailing list
> U-Boot-Users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/u-boot-users

^ permalink raw reply

* Re: bug 2400
From: Jens Axboe @ 2004-04-06 14:04 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel,
	SCSI Mailing List
In-Reply-To: <1081259807.1804.9.camel@mulgrave>

On Tue, Apr 06 2004, James Bottomley wrote:
> On Tue, 2004-04-06 at 04:22, Jens Axboe wrote:
> > Really? It doesn't even compile :-)
> 
> Heh, I really must learn that I have to copy the file from the test
> machine to the email machine *before* attaching it.  You got a stale
> copy of an older incarnation, I think.

Only change is the err -> error, correct?

> The attached (hopefully) is what I compiled and tested with.  The test,
> incidentally, is simply to hold the device open and then forcibly remove
> it using scsi remove-single-device before closing it.

Better than what is there, whether it needs other synchronization is a
different question.

-- 
Jens Axboe


^ permalink raw reply

* Re: [Qemu-devel] multiple VMs
From: Jamie Burns @ 2004-04-06 13:57 UTC (permalink / raw)
  To: qemu-devel
In-Reply-To: <1081258970.6179.54.camel@localhost>

You might want to look into Zen if that is your goal.

http://www.cl.cam.ac.uk/Research/SRG/netos/xen/

It is a tiny OS that just runs other OS' on top of it. Microsoft has a
Windows XP port, although I don't know if they will release anything.

Jamie.

----- Original Message ----- 
From: "Joe Batt" <Joe@soliddesign.net>
To: <qemu-devel@nongnu.org>
Sent: Tuesday, April 06, 2004 2:42 PM
Subject: [Qemu-devel] multiple VMs


> ...
> > I think that multiple VM's is a worthy goal as long as you can minimise
CPU
> > usage. Having multiple VM's gives you the ability to do some very cool
> > things. I use VMWARE in Windows and sometimes have both Linux and
FreeBSD
> > running in VM's so I can test software against all 3 OS's at once. I
imagine
> > it would be very useful to developers of cluster software.
>
> Run two or three copies of QEMU.  I think QEMU is so much cooler than
> VMWare because it runs completely in user space.  You can run as many
> copies as you need and be confident that they aren't interfering with
> each other.
>
> A pause button would be nice, but I think CTRL-Z works just fine for
> now.
>
> As a developer, I'd love to have a single stable tiny Linux distro
> running on the metal and a dozen other "machines" to do work on.  The
> expense of VMWare wont allow me to do that now, as I use a variety of
> desktop machines (at different client sites).  My personal office
> machine does operate that way.
>
> My priorities (though I don't have time to contribute) are winnt family
> guest support, stability, speed.
>
> Joe
>
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://mail.nongnu.org/mailman/listinfo/qemu-devel

^ permalink raw reply

* Re: [Qemu-devel] Qemu workstation
From: Jamie Burns @ 2004-04-06 14:00 UTC (permalink / raw)
  To: qemu-devel
In-Reply-To: <20040406121738.GC2774@linux-m68k.org>

Isnt there an instruction that gets sent when the OS is idle? To keep
processors cool etc?

VMWare describes some of this in:

http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=1077

:o)


----- Original Message ----- 
From: "Richard Zidlicky" <rz@linux-m68k.org>
To: <qemu-devel@nongnu.org>
Sent: Tuesday, April 06, 2004 1:17 PM
Subject: Re: [Qemu-devel] Qemu workstation


> On Mon, Apr 05, 2004 at 10:49:51PM +0100, Jamie Burns wrote:
> > > I am not sure that handling multiple VMs running at the same time is
> > > very useful (some architectural changes are needed in QEMU). But
> > > switching easily between VM configurations seems interesting.
> >
> > I think that multiple VM's is a worthy goal as long as you can minimise
CPU
> > usage. Having multiple VM's gives you the ability to do some very cool
> > things. I use VMWARE in Windows and sometimes have both Linux and
FreeBSD
> > running in VM's so I can test software against all 3 OS's at once. I
imagine
> > it would be very useful to developers of cluster software.
> >
> > I tried the Win32 port the other day, running Linux, and it sat using
100%
> > of the CPU whilst doing next to nothing at a command prompt. Using
VMWARE,
> > and waiting at a command prompt uses very little CPU time.
> >
> > Is QEMU sat in a busy loop all the time?
>
> it is not QEMU but the hosted OS that is in the busy loop. QEMU
> will have to recognise "idle loops" to fix this - this could be
> really tricky.
>
> Richard
>
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://mail.nongnu.org/mailman/listinfo/qemu-devel

^ permalink raw reply

* Re: [linux-usb-devel] Re: bug 2400
From: James Bottomley @ 2004-04-06 14:03 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: David Brownell, Alan Stern, Mike Anderson, Andrew Morton, greg,
	Jens Axboe, linux-usb-devel, SCSI Mailing List
In-Reply-To: <200404060852.34169.oliver@neukum.org>

On Tue, 2004-04-06 at 01:52, Oliver Neukum wrote:
> Pure refcounting can never protect you against races with freeing objects.
> The counters themselves must be protected. Try as you might you need
> locks for that and rules on how this locks are to be used.

Which part of

On Mon, 2004-04-05 at 20:19, James Bottomley wrote:
> Now, if the subsystem is going to garbage collect its own object as a
> result of the other object disconnect, then it is responsible for
> synchronising that with reference gets on its own object.  However, that
> is easily achievable via *intra* subsystem synchronisation.

didn't you understand?

However, how a subsystem resolves this intra subsystem synchronisation
is up to it ... and it doesn't have to do it with locks.  So there are
no exposed locks for this and no rules therefore, for how to use them.

James



^ permalink raw reply

* RE: [PATCH] dpt_i2o changes for 2.6.2 kernel in support of 64 bit and bitrot (part 1)
From: Salyzyn, Mark @ 2004-04-06 14:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig [mailto:hch@infradead.org] writes: 
> On Mon, Apr 05, 2004 at 06:23:49PM -0400, Salyzyn, Mark wrote:
>> This would be unacceptable to our OEMs and users of our applications.
> Could you please stop this OEM crap?  No one cares more for someone just
> because he's an OEM. And we already had the stupid user discussion a few
> times, look up the long EVMS and Aunþ Tilly threads.

And *that* is the root of the problem ... sigh ... but I will try not take the flame bait, for what now will no doubt be an infamous statement of the goals of the Linux kernel community. I do not want to turn this into a flame war, but rather have us admit our differences and respect each other's needs.

OEMs and their users are virtually *all* we are concerned about. If we can not qualify our products within their needs, we disappear, or at least we disappear on operating systems that fail to provide for the needs of the users.

These users that are represented by us, and our OEMs, are overworked IS people, working overtime hours, most likely working remote, and I am sorry, I *need* to tell them when they make a critical mistake, in the fog of their war, that will panic their system. I need to tell them, at the *instant* of the mistake, that they are about to delete an array that is currently `in-use' by the operating system. I *really* need to tell them that they are about to incorporate their in-use, possibly boot, drive as a component of an array. I do *not* need to tell them that they are stupid.

I loose sleep at night thinking about these poor souls. No, to be honest, actually it is because our OEM's test group thinks out fantastic scenarios of ways to break yours and my products and hold us accountable for such failures. "Deleting array causes kernel ooops" news at 11 ...

Which reason do we ascribe to this failure of the operating system to provide the service of knowing if a drive is currently in-use:

1) system has been architected in such a manner that it would be impossible at this juncture to add?
2) Someone made a conscientious decision to not provide this functionality?
3) There is a way to tell, it just is not clear to any of us at this moment?
4) There is a way to tell, but it would be an ugly hack that will no doubt break upon the next revision (as this hack has done traversing from 2.4 to 2.6)?
5) Pride?
 
Sincerely -- Mark Salyzyn

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Fwd: [PATCH] jiffies must be unsigned long]
From: Geert Uytterhoeven @ 2004-04-06 14:01 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux Kernel Development
In-Reply-To: <1081254194.4680.3.camel@laptop.fenrus.com>

On Tue, 6 Apr 2004, Arjan van de Ven wrote:
>
> > -			for(i=jiffies+HZ/100;time_before(jiffies, i););
> > +			for(t=jiffies+HZ/100;time_before(jiffies, t););
>
> how nice... but ehm... if you fix it why not really fix it ???

Because I just want to get rid of all the annoying warnings when trying to
compile as many drivers as possible on m68k.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: dc395x: CD writer causes endless "sg_to_virt failed" loop
From: Jamie Lenehan @ 2004-04-06 14:01 UTC (permalink / raw)
  To: Andrew Schulman; +Cc: linux-scsi, dc395x
In-Reply-To: <200404050120.56111.andrex@alumni.utexas.net>

On Mon, Apr 05, 2004 at 01:20:56AM -0400, Andrew Schulman wrote:
> > First thing to try would be to upgrade to 2.6.5-rc1 (or newer) and
> > see if that works any better.
> 
> I'm now running kernel 2.6.5.  The good news is that my CD writer (HP 9200i) 
> works again.  The bad news is that I'm still getting e.g.
> 
> Apr  5 00:46:50 helium kernel: dc395x: sg_update_list: sg_to_virt failed
> Apr  5 00:47:21 helium last message repeated 576 times
> Apr  5 00:48:22 helium last message repeated 974 times
> Apr  5 00:49:23 helium last message repeated 979 times
> Apr  5 00:50:24 helium last message repeated 971 times
> Apr  5 00:51:25 helium last message repeated 970 times
> Apr  5 00:52:26 helium last message repeated 974 times
> Apr  5 00:52:40 helium last message repeated 228 times
> 
> (result of burning a CD).  This isn't great, but it's tolerable and at least 
> the CD writer works again.

Have you verified that the CD(s) were actually written correctly?
That's the one thing I'd really be concerned about. If so then you
can always just comment the message out of the source code for now.

I'll look over that code again and see if I can see why it thinks it
cannot find the address it's looking for (and if it matters).

What would be helpful (if your bored!) would be to enable the
debugging at the top of dc395x.c and do a very short run (half a
dozen copies of the message is enough, no need for 10k of them ;) and
then compress the logs and put them somewhere I can grab them (or
e-mail them if < 5MB).

Thanks.

-- 
 Jamie Lenehan <lenehan@twibble.org>

^ permalink raw reply

* Re: [RFC][PATCH] SCSI tape log message fixes
From: Hironobu Ishii @ 2004-04-06 14:01 UTC (permalink / raw)
  To: Kai Makisara, linux-scsi
In-Reply-To: <Pine.LNX.4.58.0404031334490.1131@kai.makisara.local>


----- Original Message ----- 
From: "Kai Makisara" <Kai.Makisara@kolumbus.fi>
Sent: Saturday, April 03, 2004 7:44 PM
Subject: [RFC][PATCH] SCSI tape log message fixes


> This patch changes the st console/log messages:
> 
> - __GFP_NOWARN added to buffer allocation to suppress useless messages 
>   when having to use smaller than default segments
> - move log message from enlarge_buffer() to caller so that the tape name 
>   can be printed and remove some debugging messages; now the st messages
>   should include drive name where applicable (a problem reported by 
>   Hironobu Ishii)
> - setting options is logged only when debugging; the most important 
>   options are now seen in sysfs
>  
> Kai

Thank you, Kai. 
It's OK to me. I'm happy to see this patch.

Thanks,
Hironobu Ishii.


^ permalink raw reply

* Re: bug 2400
From: Alan Stern @ 2004-04-06 14:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List
In-Reply-To: <1081200462.2105.63.camel@mulgrave>

On 5 Apr 2004, James Bottomley wrote:

> On Sun, 2004-04-04 at 22:17, Alan Stern wrote:
> > Of course it would!  That's exactly what this thread is about: bugs caused
> > by improper handling of open/disconnect races.
> 
> So you're quietly shifting your ground away from this assertion:
> 
> On Sat, 2004-04-03 at 19:40, Alan Stern wrote:
> > The problem _can_ be solved by introducing a lock higher up, such as at
> > the driver level or at the bus level.  (A kernel lock would work too but
> > it would be extravagantly excessive.)  For example, the bus subsystem
> > rwsem in the driver model prevents analogous problems there.  But you
> > don't want to get a read lock on a bus-wide semaphore every time your
> > open() procedure runs!  A driver-wide lock makes a good solution.
> > 
> > Another possible solution would be to have disconnect() perform an RCU 
> > update to the device pointer.  I haven't seen any code that does this, but 
> > I think it ought to work.
> 
> ?

Not at all.  Open/disconnect races can appear at several points in a 
driver.  The earlier assertion referred to one of those points: the place 
where the code gets a pointer to a device structure from an open inode's 
private data.

> My contention is that the races can be solved by proper refcounting
> (without the need for locks and RCUs) not that we don't have any bugs in
> sd.c (I'll be happy if I can pull them all out of sr at the moment).

Your contention is correct, I agree.  However, not all parts of the kernel
implement proper refcounting and, equally important, proper
delete-then-release notifications.  In particular, the VFS layer doesn't.  
If a driver notifies VFS that a device has been disconnected so that the
private data in the corresponding /dev inode entry can be removed, the
driver does _not_ receive a release notification in return when the last
process actively using that inode has finished with it.

Given the lack of proper adherence to the object-lifetime rules in VFS,
a driver has no choice but to adopt a scheme similar to what I outlined
in the section you quoted above.  _That's_ the manifestation of the
open/disconnect race I was referring to.  (It's different from the 
problems in sd.c, incidentally.)

Alan Stern


^ permalink raw reply

* Re: [Qemu-devel] Qemu workstation
From: Brad Campbell @ 2004-04-06 13:28 UTC (permalink / raw)
  To: qemu-devel
In-Reply-To: <20040406121738.GC2774@linux-m68k.org>

Richard Zidlicky wrote:
> On Mon, Apr 05, 2004 at 10:49:51PM +0100, Jamie Burns wrote:
> 
>>>I am not sure that handling multiple VMs running at the same time is
>>>very useful (some architectural changes are needed in QEMU). But
>>>switching easily between VM configurations seems interesting.
>>
>>I think that multiple VM's is a worthy goal as long as you can minimise CPU
>>usage. Having multiple VM's gives you the ability to do some very cool
>>things. I use VMWARE in Windows and sometimes have both Linux and FreeBSD
>>running in VM's so I can test software against all 3 OS's at once. I imagine
>>it would be very useful to developers of cluster software.
>>
>>I tried the Win32 port the other day, running Linux, and it sat using 100%
>>of the CPU whilst doing next to nothing at a command prompt. Using VMWARE,
>>and waiting at a command prompt uses very little CPU time.
>>
>>Is QEMU sat in a busy loop all the time?
> 
> 
> it is not QEMU but the hosted OS that is in the busy loop. QEMU
> will have to recognise "idle loops" to fix this - this could be
> really tricky.

I guess if QEMU emulates the hlt instruction and the OS supports it then it's pretty easy.
I note above it said running linux, which does idle nicely when the hlt instruction is present, 
perhaps there is a not too difficult way for intelligent OS's. DOS is a lost cause however.

Brad

^ permalink raw reply

* Re: Oops with cpufreq on 2.6.5-mm1
From: Dave Jones @ 2004-04-06 13:54 UTC (permalink / raw)
  To: Jan Killius; +Cc: linux-kernel
In-Reply-To: <20040406101609.GA25248@gate.unimatrix>

On Tue, Apr 06, 2004 at 12:16:09PM +0200, Jan Killius wrote:
 > The patch have fixed the problem. thx
 
Can you try out the fully merged patch at
http://www.codemonkey.org.uk/projects/bitkeeper/cpufreq
too please ?

		Dave


^ permalink raw reply

* Re: bug 2400
From: James Bottomley @ 2004-04-06 13:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel,
	SCSI Mailing List
In-Reply-To: <20040406092244.GH28109@suse.de>

On Tue, 2004-04-06 at 04:22, Jens Axboe wrote:
> Really? It doesn't even compile :-)

Heh, I really must learn that I have to copy the file from the test
machine to the email machine *before* attaching it.  You got a stale
copy of an older incarnation, I think.

The attached (hopefully) is what I compiled and tested with.  The test,
incidentally, is simply to hold the device open and then forcibly remove
it using scsi remove-single-device before closing it.

James

===== drivers/scsi/sr.c 1.103 vs edited =====
--- 1.103/drivers/scsi/sr.c	Fri Apr  2 11:30:44 2004
+++ edited/drivers/scsi/sr.c	Tue Apr  6 08:49:30 2004
@@ -113,6 +113,28 @@
 	.generic_packet		= sr_packet,
 };
 
+static void sr_kobject_release(struct kobject *kobj);
+
+static struct kobj_type scsi_cdrom_kobj_type = {
+	.release = sr_kobject_release,
+};
+
+/*
+ * The get and put routines for the struct scsi_cd.  Note this entity
+ * has a scsi_device pointer and owns a reference to this.
+ */
+static inline int scsi_cd_get(struct scsi_cd *cd)
+{
+	if (!kobject_get(&cd->kobj))
+		return -ENODEV;
+	return 0;
+}
+
+static inline void scsi_cd_put(struct scsi_cd *cd)
+{
+	kobject_put(&cd->kobj);
+}
+
 /*
  * This function checks to see if the media has been changed in the
  * CDROM drive.  It is possible that we have already sensed a change,
@@ -424,8 +446,15 @@
 
 static int sr_block_release(struct inode *inode, struct file *file)
 {
+	int ret;
 	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
-	return cdrom_release(&cd->cdi, file);
+	ret = cdrom_release(&cd->cdi, file);
+	if(ret)
+		return ret;
+	
+	scsi_cd_put(cd);
+
+	return 0;
 }
 
 static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd,
@@ -467,7 +496,7 @@
 	struct scsi_device *sdev = cd->device;
 	int retval;
 
-	retval = scsi_device_get(sdev);
+	retval = scsi_cd_get(cd);
 	if (retval)
 		return retval;
 	
@@ -489,7 +518,7 @@
 	return 0;
 
 error_out:
-	scsi_device_put(sdev);
+	scsi_cd_put(cd);
 	return retval;	
 }
 
@@ -500,7 +529,6 @@
 	if (cd->device->sector_size > 2048)
 		sr_set_blocklength(cd, 2048);
 
-	scsi_device_put(cd->device);
 }
 
 static int sr_probe(struct device *dev)
@@ -514,12 +542,18 @@
 	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
 		goto fail;
 
+	if ((error = scsi_device_get(sdev)) != 0)
+		goto fail;
+
 	error = -ENOMEM;
 	cd = kmalloc(sizeof(*cd), GFP_KERNEL);
 	if (!cd)
-		goto fail;
+		goto fail_put_sdev;
 	memset(cd, 0, sizeof(*cd));
 
+	kobject_init(&cd->kobj);
+	cd->kobj.ktype = &scsi_cdrom_kobj_type;
+
 	disk = alloc_disk(1);
 	if (!disk)
 		goto fail_free;
@@ -588,6 +622,8 @@
 	put_disk(disk);
 fail_free:
 	kfree(cd);
+fail_put_sdev:
+	scsi_device_put(sdev);
 fail:
 	return error;
 }
@@ -863,19 +899,31 @@
 	return cgc->stat;
 }
 
-static int sr_remove(struct device *dev)
+static void sr_kobject_release(struct kobject *kobj)
 {
-	struct scsi_cd *cd = dev_get_drvdata(dev);
-
-	del_gendisk(cd->disk);
+	struct scsi_cd *cd = container_of(kobj, struct scsi_cd, kobj);
+	struct scsi_device *sdev = cd->device;
 
 	spin_lock(&sr_index_lock);
 	clear_bit(cd->disk->first_minor, sr_index_bits);
 	spin_unlock(&sr_index_lock);
 
-	put_disk(cd->disk);
 	unregister_cdrom(&cd->cdi);
+
+	put_disk(cd->disk);
+
 	kfree(cd);
+
+	scsi_device_put(sdev);
+}
+
+static int sr_remove(struct device *dev)
+{
+	struct scsi_cd *cd = dev_get_drvdata(dev);
+
+	del_gendisk(cd->disk);
+
+	scsi_cd_put(cd);
 
 	return 0;
 }



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply

* Re: AGP problem SiS 746FX Linux 2.6.5-rc3
From: Dave Jones @ 2004-04-06 13:50 UTC (permalink / raw)
  To: Bjoern Michaelsen; +Cc: linux-kernel, volker.hemmann
In-Reply-To: <20040406031949.GA8351@lord.sinclair>

On Tue, Apr 06, 2004 at 05:19:49AM +0200, Bjoern Michaelsen wrote:

 > I wrote a patch against 2.6.5 to let the SiS 746 take advantage
 > of the SiS 648 patches too.
 > http://bugzilla.kernel.org/show_bug.cgi?id=2327

That and a few others are in the pending queue which I'll push
when Linus gets back. See http://www.codemonkey.org.uk/projects/bitkeeper/agpgart/
for the patch-of-the-day from bk://linux-dj.bkbits.net/agpgart

In particular theres an additional fix for SiS users, I broke
AGPv2 support in the previous fix that went into 2.6.5

		Dave


^ permalink raw reply

* Can a TCP-PDU being tunneled within a UDP-packet be accepted ?
From: Christian Riechmann @ 2004-04-06 13:56 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bussmann

Hello,

Here is what I want to do:
I would like to encapsulate a TCP-PDU within a new PDU, which shall be
transmitted as a UDP-PDU to the recipient. On the recipients site the
TCP-PDU shall be decapsulated out of the UDP-PDU and with the verdict
ACCEPTED shall begiven back to the kernel for further processing.

The software I am using:
I am using IPv6 with ip6tables 1.2.9 and Linux kernel 2.6.4.

What I can see:
On the sending host ipq_read shows the TCP-PDU, this TCP-PDU is encapsulated
and sent out as UDP-PDU (tcpdump shows the UDP-PDU).
On the receiving host the transmitted UDP-PDU is received, the encapsulated
TCP-PDU is decapsulated and this TCP-PDU is given to ipq_set_verdict with
the action-parameter set to ACCEPT.
Now the PROBLEM: This accepted TCP-PDU does not arrive at the application!

I should mention, that this problem does not occur when instead of
TCP-PDUs ICMP- or UDP-PDUs are encapsulated, transmitted and decapsulated.

Hoping somebody can give me a hint to solve this problem.

Thanks in advance

Christian
-- 
Christian Riechmann    E-Mail: riechmann@fgan.de
c/o FGAN/FKIE          Tel: (+49) 228/9435 345,378
Neuenahrer Strasse 20  Fax: (+49) 228/9435 685
D-53343 Wachtberg, Germany

^ permalink raw reply

* Obsolete bdflush system call
From: Armen Kaleshian @ 2004-04-06 13:55 UTC (permalink / raw)
  To: linux-admin

Hey folks,

I recently revamped my firewall with RH v8 and upgraded the kernel to v2.6.5.  The box started up fine, except for the following message in dmesg:

warning: process `update' used the obsolete bdflush system call
Fix your initscripts?

I did some research in the admin and newbie archives with no resolution. I understand that the update daemon hasn't been updated in years, and that bdflush is obsolete. Is there anyway to replace the bdflush call with what's appropriate without replacing the initscripts rpm? 

Any suggestions would be appreciated.

Thanks!

--Armen

^ permalink raw reply

* Re: RE: Race condition in xprt_disconnect
From: Trond Myklebust @ 2004-04-06 13:51 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: nfs
In-Reply-To: <20040406092401.GA29906@suse.de>

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On Tue, 2004-04-06 at 05:24, Olaf Kirch wrote:
> Hi Trond,
> Et ceterum censeo: sunrpc needs a rewrite :)
> I'd like to spend some time on it this summer...

I've already got some ideas for some short-term changes...

Here's a couple of patches I've been playing around with lately. The
idea is to make rpciod a little more generic, *and* to make it easier to
migrate rpc_tasks onto other threads when we have a need to perform
operations that might otherwise deadlock rpciod.

Cheers,
  Trond

[-- Attachment #2: linux-2.6.5-36-rpc_workqueue.dif --]
[-- Type: text/plain, Size: 29735 bytes --]

 fs/nfs/unlink.c              |    3 
 include/linux/sunrpc/sched.h |   51 +++--
 net/sunrpc/clnt.c            |    4 
 net/sunrpc/sched.c           |  428 ++++++++++++-------------------------------
 net/sunrpc/xprt.c            |    2 
 5 files changed, 157 insertions(+), 331 deletions(-)

diff -u --recursive --new-file --show-c-function linux-2.6.5-NFS_ALL/fs/nfs/unlink.c linux-2.6.5-34-rpc_workqueue/fs/nfs/unlink.c
--- linux-2.6.5-NFS_ALL/fs/nfs/unlink.c	2004-04-02 21:16:33.000000000 -0500
+++ linux-2.6.5-34-rpc_workqueue/fs/nfs/unlink.c	2004-04-02 22:57:29.000000000 -0500
@@ -211,7 +211,6 @@ nfs_complete_unlink(struct dentry *dentr
 	data->count++;
 	nfs_copy_dname(dentry, data);
 	dentry->d_flags &= ~DCACHE_NFSFS_RENAMED;
-	if (data->task.tk_rpcwait == &nfs_delete_queue)
-		rpc_wake_up_task(&data->task);
+	rpc_wake_up_task(&data->task);
 	nfs_put_unlinkdata(data);
 }
diff -u --recursive --new-file --show-c-function linux-2.6.5-NFS_ALL/include/linux/sunrpc/sched.h linux-2.6.5-34-rpc_workqueue/include/linux/sunrpc/sched.h
--- linux-2.6.5-NFS_ALL/include/linux/sunrpc/sched.h	2004-04-02 21:15:51.000000000 -0500
+++ linux-2.6.5-34-rpc_workqueue/include/linux/sunrpc/sched.h	2004-04-03 10:48:27.000000000 -0500
@@ -12,6 +12,7 @@
 #include <linux/timer.h>
 #include <linux/sunrpc/types.h>
 #include <linux/wait.h>
+#include <linux/workqueue.h>
 #include <linux/sunrpc/xdr.h>
 
 /*
@@ -25,11 +26,18 @@ struct rpc_message {
 	struct rpc_cred *	rpc_cred;	/* Credentials */
 };
 
+struct rpc_wait_queue;
+struct rpc_wait {
+	struct list_head	list;		/* wait queue links */
+	struct list_head	links;		/* Links to related tasks */
+	wait_queue_head_t	waitq;		/* sync: sleep on this q */
+	struct rpc_wait_queue *	rpc_waitq;	/* RPC wait queue we're on */
+};
+
 /*
  * This is the RPC task struct
  */
 struct rpc_task {
-	struct list_head	tk_list;	/* wait queue links */
 #ifdef RPC_DEBUG
 	unsigned long		tk_magic;	/* 0xf00baa */
 #endif
@@ -37,7 +45,6 @@ struct rpc_task {
 	struct rpc_clnt *	tk_client;	/* RPC client */
 	struct rpc_rqst *	tk_rqstp;	/* RPC request */
 	int			tk_status;	/* result of last operation */
-	struct rpc_wait_queue *	tk_rpcwait;	/* RPC wait queue we're on */
 
 	/*
 	 * RPC call state
@@ -70,13 +77,18 @@ struct rpc_task {
 	 * you have a pathological interest in kernel oopses.
 	 */
 	struct timer_list	tk_timer;	/* kernel timer */
-	wait_queue_head_t	tk_wait;	/* sync: sleep on this q */
 	unsigned long		tk_timeout;	/* timeout for rpc_sleep() */
 	unsigned short		tk_flags;	/* misc flags */
 	unsigned char		tk_active   : 1;/* Task has been activated */
 	unsigned char		tk_priority : 2;/* Task priority */
 	unsigned long		tk_runstate;	/* Task run status */
-	struct list_head	tk_links;	/* links to related tasks */
+	struct workqueue_struct	*tk_workqueue;	/* Normally rpciod, but could
+						 * be any workqueue
+						 */
+	union {
+		struct work_struct	tk_work;	/* Async task work queue */
+		struct rpc_wait		tk_wait;	/* RPC wait */
+	} u;
 #ifdef RPC_DEBUG
 	unsigned short		tk_pid;		/* debugging aid */
 #endif
@@ -87,11 +99,11 @@ struct rpc_task {
 /* support walking a list of tasks on a wait queue */
 #define	task_for_each(task, pos, head) \
 	list_for_each(pos, head) \
-		if ((task=list_entry(pos, struct rpc_task, tk_list)),1)
+		if ((task=list_entry(pos, struct rpc_task, u.tk_wait.list)),1)
 
 #define	task_for_first(task, head) \
 	if (!list_empty(head) &&  \
-	    ((task=list_entry((head)->next, struct rpc_task, tk_list)),1))
+	    ((task=list_entry((head)->next, struct rpc_task, u.tk_wait.list)),1))
 
 /* .. and walking list of all tasks */
 #define	alltask_for_each(task, pos, head) \
@@ -124,22 +136,24 @@ typedef void			(*rpc_action)(struct rpc_
 #define RPC_DO_CALLBACK(t)	((t)->tk_callback != NULL)
 #define RPC_IS_SOFT(t)		((t)->tk_flags & RPC_TASK_SOFT)
 
-#define RPC_TASK_SLEEPING	0
-#define RPC_TASK_RUNNING	1
-#define RPC_IS_SLEEPING(t)	(test_bit(RPC_TASK_SLEEPING, &(t)->tk_runstate))
-#define RPC_IS_RUNNING(t)	(test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
+#define RPC_TASK_RUNNING	0
+#define RPC_TASK_QUEUED		1
 
+#define RPC_IS_RUNNING(t)	(test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
 #define rpc_set_running(t)	(set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
-#define rpc_clear_running(t)	(clear_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
-
-#define rpc_set_sleeping(t)	(set_bit(RPC_TASK_SLEEPING, &(t)->tk_runstate))
-
-#define rpc_clear_sleeping(t) \
+#define rpc_test_and_set_running(t) \
+				(test_and_set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
+#define rpc_clear_running(t)	\
 	do { \
 		smp_mb__before_clear_bit(); \
-		clear_bit(RPC_TASK_SLEEPING, &(t)->tk_runstate); \
+		clear_bit(RPC_TASK_RUNNING, &(t)->tk_runstate); \
 		smp_mb__after_clear_bit(); \
-	} while(0)
+	} while (0)
+
+#define RPC_IS_QUEUED(t)	(test_bit(RPC_TASK_QUEUED, &(t)->tk_runstate))
+#define rpc_set_queued(t)	(set_bit(RPC_TASK_QUEUED, &(t)->tk_runstate))
+#define rpc_test_and_clear_queued(t) \
+		(test_and_clear_bit(RPC_TASK_QUEUED, &(t)->tk_runstate))
 
 /*
  * Task priorities.
@@ -207,13 +221,10 @@ void		rpc_killall_tasks(struct rpc_clnt 
 int		rpc_execute(struct rpc_task *);
 void		rpc_run_child(struct rpc_task *parent, struct rpc_task *child,
 					rpc_action action);
-int		rpc_add_wait_queue(struct rpc_wait_queue *, struct rpc_task *);
-void		rpc_remove_wait_queue(struct rpc_task *);
 void		rpc_init_priority_wait_queue(struct rpc_wait_queue *, const char *);
 void		rpc_init_wait_queue(struct rpc_wait_queue *, const char *);
 void		rpc_sleep_on(struct rpc_wait_queue *, struct rpc_task *,
 					rpc_action action, rpc_action timer);
-void		rpc_add_timer(struct rpc_task *, rpc_action);
 void		rpc_wake_up_task(struct rpc_task *);
 void		rpc_wake_up(struct rpc_wait_queue *);
 struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
diff -u --recursive --new-file --show-c-function linux-2.6.5-NFS_ALL/net/sunrpc/clnt.c linux-2.6.5-34-rpc_workqueue/net/sunrpc/clnt.c
--- linux-2.6.5-NFS_ALL/net/sunrpc/clnt.c	2004-04-02 21:15:17.000000000 -0500
+++ linux-2.6.5-34-rpc_workqueue/net/sunrpc/clnt.c	2004-04-03 11:11:17.000000000 -0500
@@ -351,7 +351,9 @@ int rpc_call_sync(struct rpc_clnt *clnt,
 	rpc_clnt_sigmask(clnt, &oldset);		
 
 	/* Create/initialize a new RPC task */
-	rpc_init_task(task, clnt, NULL, flags);
+	task = rpc_new_task(clnt, NULL, flags);
+	if (task == NULL)
+		return -ENOMEM;
 	rpc_call_setup(task, msg, 0);
 
 	/* Set up the call info struct and execute the task */
diff -u --recursive --new-file --show-c-function linux-2.6.5-NFS_ALL/net/sunrpc/sched.c linux-2.6.5-34-rpc_workqueue/net/sunrpc/sched.c
--- linux-2.6.5-NFS_ALL/net/sunrpc/sched.c	2004-04-02 21:15:30.000000000 -0500
+++ linux-2.6.5-34-rpc_workqueue/net/sunrpc/sched.c	2004-04-03 10:56:01.000000000 -0500
@@ -41,13 +41,7 @@ static mempool_t	*rpc_buffer_mempool;
 
 static void			__rpc_default_timer(struct rpc_task *task);
 static void			rpciod_killall(void);
-
-/*
- * When an asynchronous RPC task is activated within a bottom half
- * handler, or while executing another RPC task, it is put on
- * schedq, and rpciod is woken up.
- */
-static RPC_WAITQ(schedq, "schedq");
+static void			rpc_async_schedule(void *);
 
 /*
  * RPC tasks that create another task (e.g. for contacting the portmapper)
@@ -68,11 +62,9 @@ static LIST_HEAD(all_tasks);
 /*
  * rpciod-related stuff
  */
-static DECLARE_WAIT_QUEUE_HEAD(rpciod_idle);
-static DECLARE_COMPLETION(rpciod_killer);
 static DECLARE_MUTEX(rpciod_sema);
 static unsigned int		rpciod_users;
-static pid_t			rpciod_pid;
+static struct workqueue_struct *rpciod_workqueue;
 static int			rpc_inhibit;
 
 /*
@@ -105,16 +97,13 @@ __rpc_disable_timer(struct rpc_task *tas
  * without calling del_timer_sync(). The latter could cause a
  * deadlock if called while we're holding spinlocks...
  */
-static void
-rpc_run_timer(struct rpc_task *task)
+static void rpc_run_timer(struct rpc_task *task)
 {
 	void (*callback)(struct rpc_task *);
 
-	spin_lock_bh(&rpc_queue_lock);
 	callback = task->tk_timeout_fn;
 	task->tk_timeout_fn = NULL;
-	spin_unlock_bh(&rpc_queue_lock);
-	if (callback) {
+	if (callback && RPC_IS_QUEUED(task)) {
 		dprintk("RPC: %4d running timer\n", task->tk_pid);
 		callback(task);
 	}
@@ -140,17 +129,6 @@ __rpc_add_timer(struct rpc_task *task, r
 }
 
 /*
- * Set up a timer for an already sleeping task.
- */
-void rpc_add_timer(struct rpc_task *task, rpc_action timer)
-{
-	spin_lock_bh(&rpc_queue_lock);
-	if (!RPC_IS_RUNNING(task))
-		__rpc_add_timer(task, timer);
-	spin_unlock_bh(&rpc_queue_lock);
-}
-
-/*
  * Delete any timer for the current task. Because we use del_timer_sync(),
  * this function should never be called while holding rpc_queue_lock.
  */
@@ -169,16 +147,17 @@ static void __rpc_add_wait_queue_priorit
 	struct list_head *q;
 	struct rpc_task *t;
 
+	INIT_LIST_HEAD(&task->u.tk_wait.links);
 	q = &queue->tasks[task->tk_priority];
 	if (unlikely(task->tk_priority > queue->maxpriority))
 		q = &queue->tasks[queue->maxpriority];
-	list_for_each_entry(t, q, tk_list) {
+	list_for_each_entry(t, q, u.tk_wait.list) {
 		if (t->tk_cookie == task->tk_cookie) {
-			list_add_tail(&task->tk_list, &t->tk_links);
+			list_add_tail(&task->u.tk_wait.list, &t->u.tk_wait.links);
 			return;
 		}
 	}
-	list_add_tail(&task->tk_list, q);
+	list_add_tail(&task->u.tk_wait.list, q);
 }
 
 /*
@@ -189,37 +168,21 @@ static void __rpc_add_wait_queue_priorit
  * improve overall performance.
  * Everyone else gets appended to the queue to ensure proper FIFO behavior.
  */
-static int __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task)
+static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task)
 {
-	if (task->tk_rpcwait == queue)
-		return 0;
+	BUG_ON (RPC_IS_QUEUED(task));
 
-	if (task->tk_rpcwait) {
-		printk(KERN_WARNING "RPC: doubly enqueued task!\n");
-		return -EWOULDBLOCK;
-	}
 	if (RPC_IS_PRIORITY(queue))
 		__rpc_add_wait_queue_priority(queue, task);
 	else if (RPC_IS_SWAPPER(task))
-		list_add(&task->tk_list, &queue->tasks[0]);
+		list_add(&task->u.tk_wait.list, &queue->tasks[0]);
 	else
-		list_add_tail(&task->tk_list, &queue->tasks[0]);
-	task->tk_rpcwait = queue;
+		list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]);
+	task->u.tk_wait.rpc_waitq = queue;
+	rpc_set_queued(task);
 
 	dprintk("RPC: %4d added to queue %p \"%s\"\n",
 				task->tk_pid, queue, rpc_qname(queue));
-
-	return 0;
-}
-
-int rpc_add_wait_queue(struct rpc_wait_queue *q, struct rpc_task *task)
-{
-	int		result;
-
-	spin_lock_bh(&rpc_queue_lock);
-	result = __rpc_add_wait_queue(q, task);
-	spin_unlock_bh(&rpc_queue_lock);
-	return result;
 }
 
 /*
@@ -229,12 +192,12 @@ static void __rpc_remove_wait_queue_prio
 {
 	struct rpc_task *t;
 
-	if (!list_empty(&task->tk_links)) {
-		t = list_entry(task->tk_links.next, struct rpc_task, tk_list);
-		list_move(&t->tk_list, &task->tk_list);
-		list_splice_init(&task->tk_links, &t->tk_links);
+	if (!list_empty(&task->u.tk_wait.links)) {
+		t = list_entry(task->u.tk_wait.links.next, struct rpc_task, u.tk_wait.list);
+		list_move(&t->u.tk_wait.list, &task->u.tk_wait.list);
+		list_splice_init(&task->u.tk_wait.links, &t->u.tk_wait.links);
 	}
-	list_del(&task->tk_list);
+	list_del(&task->u.tk_wait.list);
 }
 
 /*
@@ -243,31 +206,17 @@ static void __rpc_remove_wait_queue_prio
  */
 static void __rpc_remove_wait_queue(struct rpc_task *task)
 {
-	struct rpc_wait_queue *queue = task->tk_rpcwait;
-
-	if (!queue)
-		return;
+	struct rpc_wait_queue *queue;
+	queue = task->u.tk_wait.rpc_waitq;
 
 	if (RPC_IS_PRIORITY(queue))
 		__rpc_remove_wait_queue_priority(task);
 	else
-		list_del(&task->tk_list);
-	task->tk_rpcwait = NULL;
-
+		list_del(&task->u.tk_wait.list);
 	dprintk("RPC: %4d removed from queue %p \"%s\"\n",
 				task->tk_pid, queue, rpc_qname(queue));
 }
 
-void
-rpc_remove_wait_queue(struct rpc_task *task)
-{
-	if (!task->tk_rpcwait)
-		return;
-	spin_lock_bh(&rpc_queue_lock);
-	__rpc_remove_wait_queue(task);
-	spin_unlock_bh(&rpc_queue_lock);
-}
-
 static inline void rpc_set_waitqueue_priority(struct rpc_wait_queue *queue, int priority)
 {
 	queue->priority = priority;
@@ -316,34 +265,27 @@ EXPORT_SYMBOL(rpc_init_wait_queue);
  * Note: If the task is ASYNC, this must be called with 
  * the spinlock held to protect the wait queue operation.
  */
-static inline void
-rpc_make_runnable(struct rpc_task *task)
+static void rpc_make_runnable(struct rpc_task *task)
 {
-	if (task->tk_timeout_fn) {
-		printk(KERN_ERR "RPC: task w/ running timer in rpc_make_runnable!!\n");
+	if (rpc_test_and_set_running(task))
 		return;
-	}
-	rpc_set_running(task);
+	BUG_ON(task->tk_timeout_fn);
 	if (RPC_IS_ASYNC(task)) {
-		if (RPC_IS_SLEEPING(task)) {
-			int status;
-			status = __rpc_add_wait_queue(&schedq, task);
-			if (status < 0) {
-				printk(KERN_WARNING "RPC: failed to add task to queue: error: %d!\n", status);
-				task->tk_status = status;
-				return;
-			}
-			rpc_clear_sleeping(task);
-			wake_up(&rpciod_idle);
+		int status;
+
+		INIT_WORK(&task->u.tk_work, rpc_async_schedule, (void *)task);
+		status = queue_work(task->tk_workqueue, &task->u.tk_work);
+		if (status < 0) {
+			printk(KERN_WARNING "RPC: failed to add task to queue: error: %d!\n", status);
+			task->tk_status = status;
+			return;
 		}
-	} else {
-		rpc_clear_sleeping(task);
-		wake_up(&task->tk_wait);
-	}
+	} else
+		wake_up(&task->u.tk_wait.waitq);
 }
 
 /*
- * Place a newly initialized task on the schedq.
+ * Place a newly initialized task on the workqueue.
  */
 static inline void
 rpc_schedule_run(struct rpc_task *task)
@@ -352,33 +294,18 @@ rpc_schedule_run(struct rpc_task *task)
 	if (RPC_IS_ACTIVATED(task))
 		return;
 	task->tk_active = 1;
-	rpc_set_sleeping(task);
 	rpc_make_runnable(task);
 }
 
 /*
- *	For other people who may need to wake the I/O daemon
- *	but should (for now) know nothing about its innards
- */
-void rpciod_wake_up(void)
-{
-	if(rpciod_pid==0)
-		printk(KERN_ERR "rpciod: wot no daemon?\n");
-	wake_up(&rpciod_idle);
-}
-
-/*
  * Prepare for sleeping on a wait queue.
  * By always appending tasks to the list we ensure FIFO behavior.
  * NB: An RPC task will only receive interrupt-driven events as long
  * as it's on a wait queue.
  */
-static void
-__rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
+static void __rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
 			rpc_action action, rpc_action timer)
 {
-	int status;
-
 	dprintk("RPC: %4d sleep_on(queue \"%s\" time %ld)\n", task->tk_pid,
 				rpc_qname(q), jiffies);
 
@@ -388,24 +315,14 @@ __rpc_sleep_on(struct rpc_wait_queue *q,
 	}
 
 	/* Mark the task as being activated if so needed */
-	if (!RPC_IS_ACTIVATED(task)) {
+	if (!RPC_IS_ACTIVATED(task))
 		task->tk_active = 1;
-		rpc_set_sleeping(task);
-	}
 
-	status = __rpc_add_wait_queue(q, task);
-	if (status) {
-		printk(KERN_WARNING "RPC: failed to add task to queue: error: %d!\n", status);
-		task->tk_status = status;
-	} else {
-		rpc_clear_running(task);
-		if (task->tk_callback) {
-			dprintk(KERN_ERR "RPC: %4d overwrites an active callback\n", task->tk_pid);
-			BUG();
-		}
-		task->tk_callback = action;
-		__rpc_add_timer(task, timer);
-	}
+	__rpc_add_wait_queue(q, task);
+
+	BUG_ON(task->tk_callback != NULL);
+	task->tk_callback = action;
+	__rpc_add_timer(task, timer);
 }
 
 void
@@ -421,13 +338,12 @@ rpc_sleep_on(struct rpc_wait_queue *q, s
 }
 
 /**
- * __rpc_wake_up_task - wake up a single rpc_task
+ * __rpc_do_wake_up_task - wake up a single rpc_task
  * @task: task to be woken up
  *
- * Caller must hold rpc_queue_lock
+ * Caller must hold rpc_queue_lock, and have cleared the task queued flag.
  */
-static void
-__rpc_wake_up_task(struct rpc_task *task)
+static void __rpc_do_wake_up_task(struct rpc_task *task)
 {
 	dprintk("RPC: %4d __rpc_wake_up_task (now %ld inh %d)\n",
 					task->tk_pid, jiffies, rpc_inhibit);
@@ -445,12 +361,9 @@ __rpc_wake_up_task(struct rpc_task *task
 		printk(KERN_ERR "RPC: Inactive task (%p) being woken up!\n", task);
 		return;
 	}
-	if (RPC_IS_RUNNING(task))
-		return;
 
 	__rpc_disable_timer(task);
-	if (task->tk_rpcwait != &schedq)
-		__rpc_remove_wait_queue(task);
+	__rpc_remove_wait_queue(task);
 
 	rpc_make_runnable(task);
 
@@ -458,6 +371,15 @@ __rpc_wake_up_task(struct rpc_task *task
 }
 
 /*
+ * Wake up the specified task
+ */
+static void __rpc_wake_up_task(struct rpc_task *task)
+{
+	if (rpc_test_and_clear_queued(task))
+		__rpc_do_wake_up_task(task);
+}
+
+/*
  * Default timeout handler if none specified by user
  */
 static void
@@ -471,14 +393,13 @@ __rpc_default_timer(struct rpc_task *tas
 /*
  * Wake up the specified task
  */
-void
-rpc_wake_up_task(struct rpc_task *task)
+void rpc_wake_up_task(struct rpc_task *task)
 {
-	if (RPC_IS_RUNNING(task))
-		return;
-	spin_lock_bh(&rpc_queue_lock);
-	__rpc_wake_up_task(task);
-	spin_unlock_bh(&rpc_queue_lock);
+	if (rpc_test_and_clear_queued(task)) {
+		spin_lock_bh(&rpc_queue_lock);
+		__rpc_do_wake_up_task(task);
+		spin_unlock_bh(&rpc_queue_lock);
+	}
 }
 
 /*
@@ -494,11 +415,11 @@ static struct rpc_task * __rpc_wake_up_n
 	 */
 	q = &queue->tasks[queue->priority];
 	if (!list_empty(q)) {
-		task = list_entry(q->next, struct rpc_task, tk_list);
+		task = list_entry(q->next, struct rpc_task, u.tk_wait.list);
 		if (queue->cookie == task->tk_cookie) {
 			if (--queue->nr)
 				goto out;
-			list_move_tail(&task->tk_list, q);
+			list_move_tail(&task->u.tk_wait.list, q);
 		}
 		/*
 		 * Check if we need to switch queues.
@@ -516,7 +437,7 @@ static struct rpc_task * __rpc_wake_up_n
 		else
 			q = q - 1;
 		if (!list_empty(q)) {
-			task = list_entry(q->next, struct rpc_task, tk_list);
+			task = list_entry(q->next, struct rpc_task, u.tk_wait.list);
 			goto new_queue;
 		}
 	} while (q != &queue->tasks[queue->priority]);
@@ -568,7 +489,7 @@ void rpc_wake_up(struct rpc_wait_queue *
 	head = &queue->tasks[queue->maxpriority];
 	for (;;) {
 		while (!list_empty(head)) {
-			task = list_entry(head->next, struct rpc_task, tk_list);
+			task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
 			__rpc_wake_up_task(task);
 		}
 		if (head == &queue->tasks[0])
@@ -594,7 +515,7 @@ void rpc_wake_up_status(struct rpc_wait_
 	head = &queue->tasks[queue->maxpriority];
 	for (;;) {
 		while (!list_empty(head)) {
-			task = list_entry(head->next, struct rpc_task, tk_list);
+			task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
 			task->tk_status = status;
 			__rpc_wake_up_task(task);
 		}
@@ -626,18 +547,14 @@ __rpc_atrun(struct rpc_task *task)
 /*
  * This is the RPC `scheduler' (or rather, the finite state machine).
  */
-static int
-__rpc_execute(struct rpc_task *task)
+static int __rpc_execute(struct rpc_task *task)
 {
 	int		status = 0;
 
 	dprintk("RPC: %4d rpc_execute flgs %x\n",
 				task->tk_pid, task->tk_flags);
 
-	if (!RPC_IS_RUNNING(task)) {
-		printk(KERN_WARNING "RPC: rpc_execute called for sleeping task!!\n");
-		return 0;
-	}
+	BUG_ON(RPC_IS_QUEUED(task));
 
  restarted:
 	while (1) {
@@ -657,7 +574,9 @@ __rpc_execute(struct rpc_task *task)
 			 */
 			save_callback=task->tk_callback;
 			task->tk_callback=NULL;
+			lock_kernel();
 			save_callback(task);
+			unlock_kernel();
 		}
 
 		/*
@@ -665,43 +584,40 @@ __rpc_execute(struct rpc_task *task)
 		 * tk_action may be NULL when the task has been killed
 		 * by someone else.
 		 */
-		if (RPC_IS_RUNNING(task)) {
+		if (!RPC_IS_QUEUED(task)) {
 			/*
 			 * Garbage collection of pending timers...
 			 */
 			rpc_delete_timer(task);
 			if (!task->tk_action)
 				break;
+			lock_kernel();
 			task->tk_action(task);
-			/* micro-optimization to avoid spinlock */
-			if (RPC_IS_RUNNING(task))
-				continue;
+			unlock_kernel();
 		}
 
 		/*
-		 * Check whether task is sleeping.
+		 * Lockless check for whether task is sleeping or not.
 		 */
-		spin_lock_bh(&rpc_queue_lock);
-		if (!RPC_IS_RUNNING(task)) {
-			rpc_set_sleeping(task);
-			if (RPC_IS_ASYNC(task)) {
-				spin_unlock_bh(&rpc_queue_lock);
+		if (!RPC_IS_QUEUED(task))
+			continue;
+		rpc_clear_running(task);
+		if (RPC_IS_ASYNC(task)) {
+			/* Careful! we may have raced... */
+			if (RPC_IS_QUEUED(task))
 				return 0;
-			}
+			if (rpc_test_and_set_running(task))
+				return 0;
+			continue;
 		}
-		spin_unlock_bh(&rpc_queue_lock);
 
-		if (!RPC_IS_SLEEPING(task))
-			continue;
+		init_waitqueue_head(&task->u.tk_wait.waitq);
 		/* sync task: sleep here */
 		dprintk("RPC: %4d sync task going to sleep\n", task->tk_pid);
-		if (current->pid == rpciod_pid)
-			printk(KERN_ERR "RPC: rpciod waiting on sync task!\n");
-
 		if (!task->tk_client->cl_intr) {
-			__wait_event(task->tk_wait, !RPC_IS_SLEEPING(task));
+			__wait_event(task->u.tk_wait.waitq, RPC_IS_RUNNING(task));
 		} else {
-			__wait_event_interruptible(task->tk_wait, !RPC_IS_SLEEPING(task), status);
+			__wait_event_interruptible(task->u.tk_wait.waitq, RPC_IS_RUNNING(task), status);
 			/*
 			 * When a sync task receives a signal, it exits with
 			 * -ERESTARTSYS. In order to catch any callbacks that
@@ -719,7 +635,9 @@ __rpc_execute(struct rpc_task *task)
 	}
 
 	if (task->tk_exit) {
+		lock_kernel();
 		task->tk_exit(task);
+		unlock_kernel();
 		/* If tk_action is non-null, the user wants us to restart */
 		if (task->tk_action) {
 			if (!RPC_ASSASSINATED(task)) {
@@ -738,7 +656,6 @@ __rpc_execute(struct rpc_task *task)
 
 	/* Release all resources associated with the task */
 	rpc_release_task(task);
-
 	return status;
 }
 
@@ -775,36 +692,9 @@ rpc_execute(struct rpc_task *task)
 	return status;
 }
 
-/*
- * This is our own little scheduler for async RPC tasks.
- */
-static void
-__rpc_schedule(void)
+static void rpc_async_schedule(void *arg)
 {
-	struct rpc_task	*task;
-	int		count = 0;
-
-	dprintk("RPC:      rpc_schedule enter\n");
-	while (1) {
-
-		task_for_first(task, &schedq.tasks[0]) {
-			__rpc_remove_wait_queue(task);
-			spin_unlock_bh(&rpc_queue_lock);
-
-			__rpc_execute(task);
-			spin_lock_bh(&rpc_queue_lock);
-		} else {
-			break;
-		}
-
-		if (++count >= 200 || need_resched()) {
-			count = 0;
-			spin_unlock_bh(&rpc_queue_lock);
-			schedule();
-			spin_lock_bh(&rpc_queue_lock);
-		}
-	}
-	dprintk("RPC:      rpc_schedule leave\n");
+	__rpc_execute((struct rpc_task *)arg);
 }
 
 /*
@@ -862,7 +752,6 @@ void rpc_init_task(struct rpc_task *task
 	task->tk_client = clnt;
 	task->tk_flags  = flags;
 	task->tk_exit   = callback;
-	init_waitqueue_head(&task->tk_wait);
 	if (current->uid != current->fsuid || current->gid != current->fsgid)
 		task->tk_flags |= RPC_TASK_SETUID;
 
@@ -873,7 +762,9 @@ void rpc_init_task(struct rpc_task *task
 
 	task->tk_priority = RPC_PRIORITY_NORMAL;
 	task->tk_cookie = (unsigned long)current;
-	INIT_LIST_HEAD(&task->tk_links);
+
+	/* Initialize workqueue for async tasks */
+	task->tk_workqueue = rpciod_workqueue;
 
 	/* Add to global list of all tasks */
 	spin_lock(&rpc_sched_lock);
@@ -961,19 +852,9 @@ rpc_release_task(struct rpc_task *task)
 	list_del(&task->tk_task);
 	spin_unlock(&rpc_sched_lock);
 
-	/* Protect the execution below. */
-	spin_lock_bh(&rpc_queue_lock);
-
-	/* Disable timer to prevent zombie wakeup */
-	__rpc_disable_timer(task);
-
-	/* Remove from any wait queue we're still on */
-	__rpc_remove_wait_queue(task);
-
+	BUG_ON (rpc_test_and_clear_queued(task));
 	task->tk_active = 0;
 
-	spin_unlock_bh(&rpc_queue_lock);
-
 	/* Synchronously delete any running timer */
 	rpc_delete_timer(task);
 
@@ -1089,82 +970,6 @@ rpc_killall_tasks(struct rpc_clnt *clnt)
 
 static DECLARE_MUTEX_LOCKED(rpciod_running);
 
-static inline int
-rpciod_task_pending(void)
-{
-	return !list_empty(&schedq.tasks[0]);
-}
-
-
-/*
- * This is the rpciod kernel thread
- */
-static int
-rpciod(void *ptr)
-{
-	int		rounds = 0;
-
-	lock_kernel();
-	/*
-	 * Let our maker know we're running ...
-	 */
-	rpciod_pid = current->pid;
-	up(&rpciod_running);
-
-	daemonize("rpciod");
-	allow_signal(SIGKILL);
-
-	dprintk("RPC: rpciod starting (pid %d)\n", rpciod_pid);
-	spin_lock_bh(&rpc_queue_lock);
-	while (rpciod_users) {
-		DEFINE_WAIT(wait);
-		if (signalled()) {
-			spin_unlock_bh(&rpc_queue_lock);
-			rpciod_killall();
-			flush_signals(current);
-			spin_lock_bh(&rpc_queue_lock);
-		}
-		__rpc_schedule();
-		if (current->flags & PF_FREEZE) {
-			spin_unlock_bh(&rpc_queue_lock);
-			refrigerator(PF_IOTHREAD);
-			spin_lock_bh(&rpc_queue_lock);
-		}
-
-		if (++rounds >= 64) {	/* safeguard */
-			spin_unlock_bh(&rpc_queue_lock);
-			schedule();
-			rounds = 0;
-			spin_lock_bh(&rpc_queue_lock);
-		}
-
-		dprintk("RPC: rpciod back to sleep\n");
-		prepare_to_wait(&rpciod_idle, &wait, TASK_INTERRUPTIBLE);
-		if (!rpciod_task_pending() && !signalled()) {
-			spin_unlock_bh(&rpc_queue_lock);
-			schedule();
-			rounds = 0;
-			spin_lock_bh(&rpc_queue_lock);
-		}
-		finish_wait(&rpciod_idle, &wait);
-		dprintk("RPC: switch to rpciod\n");
-	}
-	spin_unlock_bh(&rpc_queue_lock);
-
-	dprintk("RPC: rpciod shutdown commences\n");
-	if (!list_empty(&all_tasks)) {
-		printk(KERN_ERR "rpciod: active tasks at shutdown?!\n");
-		rpciod_killall();
-	}
-
-	dprintk("RPC: rpciod exiting\n");
-	unlock_kernel();
-
-	rpciod_pid = 0;
-	complete_and_exit(&rpciod_killer, 0);
-	return 0;
-}
-
 static void
 rpciod_killall(void)
 {
@@ -1173,9 +978,7 @@ rpciod_killall(void)
 	while (!list_empty(&all_tasks)) {
 		clear_thread_flag(TIF_SIGPENDING);
 		rpc_killall_tasks(NULL);
-		spin_lock_bh(&rpc_queue_lock);
-		__rpc_schedule();
-		spin_unlock_bh(&rpc_queue_lock);
+		flush_workqueue(rpciod_workqueue);
 		if (!list_empty(&all_tasks)) {
 			dprintk("rpciod_killall: waiting for tasks to exit\n");
 			yield();
@@ -1193,28 +996,31 @@ rpciod_killall(void)
 int
 rpciod_up(void)
 {
+	struct workqueue_struct *wq;
 	int error = 0;
 
 	down(&rpciod_sema);
-	dprintk("rpciod_up: pid %d, users %d\n", rpciod_pid, rpciod_users);
+	dprintk("rpciod_up: users %d\n", rpciod_users);
 	rpciod_users++;
-	if (rpciod_pid)
+	if (rpciod_workqueue)
 		goto out;
 	/*
 	 * If there's no pid, we should be the first user.
 	 */
 	if (rpciod_users > 1)
-		printk(KERN_WARNING "rpciod_up: no pid, %d users??\n", rpciod_users);
+		printk(KERN_WARNING "rpciod_up: no workqueue, %d users??\n", rpciod_users);
 	/*
 	 * Create the rpciod thread and wait for it to start.
 	 */
-	error = kernel_thread(rpciod, NULL, 0);
-	if (error < 0) {
-		printk(KERN_WARNING "rpciod_up: create thread failed, error=%d\n", error);
+	error = -ENOMEM;
+	wq = create_workqueue("rpciod");
+	if (wq == NULL) {
+		printk(KERN_WARNING "rpciod_up: create workqueue failed, error=%d\n", error);
 		rpciod_users--;
 		goto out;
 	}
-	down(&rpciod_running);
+	rpciod_workqueue = wq;
+	rpc_inhibit = 0;
 	error = 0;
 out:
 	up(&rpciod_sema);
@@ -1225,20 +1031,22 @@ void
 rpciod_down(void)
 {
 	down(&rpciod_sema);
-	dprintk("rpciod_down pid %d sema %d\n", rpciod_pid, rpciod_users);
+	dprintk("rpciod_down sema %d\n", rpciod_users);
 	if (rpciod_users) {
 		if (--rpciod_users)
 			goto out;
 	} else
-		printk(KERN_WARNING "rpciod_down: pid=%d, no users??\n", rpciod_pid);
+		printk(KERN_WARNING "rpciod_down: no users??\n");
 
-	if (!rpciod_pid) {
+	rpc_inhibit = 1;
+	if (!rpciod_workqueue) {
 		dprintk("rpciod_down: Nothing to do!\n");
 		goto out;
 	}
+	rpciod_killall();
 
-	kill_proc(rpciod_pid, SIGKILL, 1);
-	wait_for_completion(&rpciod_killer);
+	destroy_workqueue(rpciod_workqueue);
+	rpciod_workqueue = NULL;
  out:
 	up(&rpciod_sema);
 }
@@ -1256,7 +1064,12 @@ void rpc_show_tasks(void)
 	}
 	printk("-pid- proc flgs status -client- -prog- --rqstp- -timeout "
 		"-rpcwait -action- --exit--\n");
-	alltask_for_each(t, le, &all_tasks)
+	alltask_for_each(t, le, &all_tasks) {
+		const char *rpc_waitq = "none";
+
+		if (RPC_IS_QUEUED(t))
+			rpc_waitq = rpc_qname(t->u.tk_wait.rpc_waitq);
+
 		printk("%05d %04d %04x %06d %8p %6d %8p %08ld %8s %8p %8p\n",
 			t->tk_pid,
 			(t->tk_msg.rpc_proc ? t->tk_msg.rpc_proc->p_proc : -1),
@@ -1264,8 +1077,9 @@ void rpc_show_tasks(void)
 			t->tk_client,
 			(t->tk_client ? t->tk_client->cl_prog : 0),
 			t->tk_rqstp, t->tk_timeout,
-			rpc_qname(t->tk_rpcwait),
+			rpc_waitq,
 			t->tk_action, t->tk_exit);
+	}
 	spin_unlock(&rpc_sched_lock);
 }
 #endif
diff -u --recursive --new-file --show-c-function linux-2.6.5-NFS_ALL/net/sunrpc/xprt.c linux-2.6.5-34-rpc_workqueue/net/sunrpc/xprt.c
--- linux-2.6.5-NFS_ALL/net/sunrpc/xprt.c	2004-04-02 21:15:52.000000000 -0500
+++ linux-2.6.5-34-rpc_workqueue/net/sunrpc/xprt.c	2004-04-02 22:57:29.000000000 -0500
@@ -1078,7 +1078,7 @@ xprt_write_space(struct sock *sk)
 		goto out;
 
 	spin_lock_bh(&xprt->sock_lock);
-	if (xprt->snd_task && xprt->snd_task->tk_rpcwait == &xprt->pending)
+	if (xprt->snd_task)
 		rpc_wake_up_task(xprt->snd_task);
 	spin_unlock_bh(&xprt->sock_lock);
 out:

[-- Attachment #3: linux-2.6.5-37-rpc_queue_lock.dif --]
[-- Type: text/plain, Size: 8364 bytes --]

 include/linux/sunrpc/sched.h |    4 ++
 net/sunrpc/sched.c           |   69 ++++++++++++++++++-------------------------
 2 files changed, 34 insertions(+), 39 deletions(-)

diff -u --recursive --new-file --show-c-function linux-2.6.5-32-workqueue/include/linux/sunrpc/sched.h linux-2.6.5-33-spin/include/linux/sunrpc/sched.h
--- linux-2.6.5-32-workqueue/include/linux/sunrpc/sched.h	2004-03-27 17:27:14.000000000 -0500
+++ linux-2.6.5-33-spin/include/linux/sunrpc/sched.h	2004-03-27 17:37:08.000000000 -0500
@@ -11,6 +11,7 @@
 
 #include <linux/timer.h>
 #include <linux/sunrpc/types.h>
+#include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/workqueue.h>
 #include <linux/sunrpc/xdr.h>
@@ -166,6 +167,7 @@ typedef void			(*rpc_action)(struct rpc_
  * RPC synchronization objects
  */
 struct rpc_wait_queue {
+	spinlock_t		lock;
 	struct list_head	tasks[RPC_NR_PRIORITY];	/* task queue for each priority level */
 	unsigned long		cookie;			/* cookie of last task serviced */
 	unsigned char		maxpriority;		/* maximum priority (0 if queue is not a priority queue) */
@@ -186,6 +188,7 @@ struct rpc_wait_queue {
 
 #ifndef RPC_DEBUG
 # define RPC_WAITQ_INIT(var,qname) { \
+		.lock = SPIN_LOCK_UNLOCKED, \
 		.tasks = { \
 			[0] = LIST_HEAD_INIT(var.tasks[0]), \
 			[1] = LIST_HEAD_INIT(var.tasks[1]), \
@@ -194,6 +197,7 @@ struct rpc_wait_queue {
 	}
 #else
 # define RPC_WAITQ_INIT(var,qname) { \
+		.lock = SPIN_LOCK_UNLOCKED, \
 		.tasks = { \
 			[0] = LIST_HEAD_INIT(var.tasks[0]), \
 			[1] = LIST_HEAD_INIT(var.tasks[1]), \
diff -u --recursive --new-file --show-c-function linux-2.6.5-32-workqueue/net/sunrpc/sched.c linux-2.6.5-33-spin/net/sunrpc/sched.c
--- linux-2.6.5-32-workqueue/net/sunrpc/sched.c	2004-03-27 17:28:31.000000000 -0500
+++ linux-2.6.5-33-spin/net/sunrpc/sched.c	2004-03-27 17:33:24.000000000 -0500
@@ -68,18 +68,13 @@ static struct workqueue_struct *rpciod_w
 static int			rpc_inhibit;
 
 /*
- * Spinlock for wait queues. Access to the latter also has to be
- * interrupt-safe in order to allow timers to wake up sleeping tasks.
- */
-static spinlock_t rpc_queue_lock = SPIN_LOCK_UNLOCKED;
-/*
  * Spinlock for other critical sections of code.
  */
 static spinlock_t rpc_sched_lock = SPIN_LOCK_UNLOCKED;
 
 /*
  * Disable the timer for a given RPC task. Should be called with
- * rpc_queue_lock and bh_disabled in order to avoid races within
+ * queue->lock and bh_disabled in order to avoid races within
  * rpc_run_timer().
  */
 static inline void
@@ -130,7 +125,7 @@ __rpc_add_timer(struct rpc_task *task, r
 
 /*
  * Delete any timer for the current task. Because we use del_timer_sync(),
- * this function should never be called while holding rpc_queue_lock.
+ * this function should never be called while holding queue->lock.
  */
 static inline void
 rpc_delete_timer(struct rpc_task *task)
@@ -239,6 +234,7 @@ static void __rpc_init_priority_wait_que
 {
 	int i;
 
+	spin_lock_init(&queue->lock);
 	for (i = 0; i < ARRAY_SIZE(queue->tasks); i++)
 		INIT_LIST_HEAD(&queue->tasks[i]);
 	queue->maxpriority = maxprio;
@@ -325,23 +321,22 @@ static void __rpc_sleep_on(struct rpc_wa
 	__rpc_add_timer(task, timer);
 }
 
-void
-rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
+void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
 				rpc_action action, rpc_action timer)
 {
 	/*
 	 * Protect the queue operations.
 	 */
-	spin_lock_bh(&rpc_queue_lock);
+	spin_lock_bh(&q->lock);
 	__rpc_sleep_on(q, task, action, timer);
-	spin_unlock_bh(&rpc_queue_lock);
+	spin_unlock_bh(&q->lock);
 }
 
 /**
  * __rpc_do_wake_up_task - wake up a single rpc_task
  * @task: task to be woken up
  *
- * Caller must hold rpc_queue_lock, and have cleared the task queued flag.
+ * Caller must hold queue->lock, and have cleared the task queued flag.
  */
 static void __rpc_do_wake_up_task(struct rpc_task *task)
 {
@@ -396,9 +391,11 @@ __rpc_default_timer(struct rpc_task *tas
 void rpc_wake_up_task(struct rpc_task *task)
 {
 	if (rpc_test_and_clear_queued(task)) {
-		spin_lock_bh(&rpc_queue_lock);
+		struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
+
+		spin_lock_bh(&queue->lock);
 		__rpc_do_wake_up_task(task);
-		spin_unlock_bh(&rpc_queue_lock);
+		spin_unlock_bh(&queue->lock);
 	}
 }
 
@@ -462,14 +459,14 @@ struct rpc_task * rpc_wake_up_next(struc
 	struct rpc_task	*task = NULL;
 
 	dprintk("RPC:      wake_up_next(%p \"%s\")\n", queue, rpc_qname(queue));
-	spin_lock_bh(&rpc_queue_lock);
+	spin_lock_bh(&queue->lock);
 	if (RPC_IS_PRIORITY(queue))
 		task = __rpc_wake_up_next_priority(queue);
 	else {
 		task_for_first(task, &queue->tasks[0])
 			__rpc_wake_up_task(task);
 	}
-	spin_unlock_bh(&rpc_queue_lock);
+	spin_unlock_bh(&queue->lock);
 
 	return task;
 }
@@ -478,14 +475,14 @@ struct rpc_task * rpc_wake_up_next(struc
  * rpc_wake_up - wake up all rpc_tasks
  * @queue: rpc_wait_queue on which the tasks are sleeping
  *
- * Grabs rpc_queue_lock
+ * Grabs queue->lock
  */
 void rpc_wake_up(struct rpc_wait_queue *queue)
 {
 	struct rpc_task *task;
 
 	struct list_head *head;
-	spin_lock_bh(&rpc_queue_lock);
+	spin_lock_bh(&queue->lock);
 	head = &queue->tasks[queue->maxpriority];
 	for (;;) {
 		while (!list_empty(head)) {
@@ -496,7 +493,7 @@ void rpc_wake_up(struct rpc_wait_queue *
 			break;
 		head--;
 	}
-	spin_unlock_bh(&rpc_queue_lock);
+	spin_unlock_bh(&queue->lock);
 }
 
 /**
@@ -504,14 +501,14 @@ void rpc_wake_up(struct rpc_wait_queue *
  * @queue: rpc_wait_queue on which the tasks are sleeping
  * @status: status value to set
  *
- * Grabs rpc_queue_lock
+ * Grabs queue->lock
  */
 void rpc_wake_up_status(struct rpc_wait_queue *queue, int status)
 {
 	struct list_head *head;
 	struct rpc_task *task;
 
-	spin_lock_bh(&rpc_queue_lock);
+	spin_lock_bh(&queue->lock);
 	head = &queue->tasks[queue->maxpriority];
 	for (;;) {
 		while (!list_empty(head)) {
@@ -523,7 +520,7 @@ void rpc_wake_up_status(struct rpc_wait_
 			break;
 		head--;
 	}
-	spin_unlock_bh(&rpc_queue_lock);
+	spin_unlock_bh(&queue->lock);
 }
 
 /*
@@ -830,8 +827,7 @@ cleanup:
 	goto out;
 }
 
-void
-rpc_release_task(struct rpc_task *task)
+void rpc_release_task(struct rpc_task *task)
 {
 	dprintk("RPC: %4d release task\n", task->tk_pid);
 
@@ -881,10 +877,9 @@ rpc_release_task(struct rpc_task *task)
  * queue 'childq'. If so returns a pointer to the parent.
  * Upon failure returns NULL.
  *
- * Caller must hold rpc_queue_lock
+ * Caller must hold childq.lock
  */
-static inline struct rpc_task *
-rpc_find_parent(struct rpc_task *child)
+static inline struct rpc_task *rpc_find_parent(struct rpc_task *child)
 {
 	struct rpc_task	*task, *parent;
 	struct list_head *le;
@@ -897,17 +892,16 @@ rpc_find_parent(struct rpc_task *child)
 	return NULL;
 }
 
-static void
-rpc_child_exit(struct rpc_task *child)
+static void rpc_child_exit(struct rpc_task *child)
 {
 	struct rpc_task	*parent;
 
-	spin_lock_bh(&rpc_queue_lock);
+	spin_lock_bh(&childq.lock);
 	if ((parent = rpc_find_parent(child)) != NULL) {
 		parent->tk_status = child->tk_status;
 		__rpc_wake_up_task(parent);
 	}
-	spin_unlock_bh(&rpc_queue_lock);
+	spin_unlock_bh(&childq.lock);
 }
 
 /*
@@ -930,22 +924,20 @@ fail:
 	return NULL;
 }
 
-void
-rpc_run_child(struct rpc_task *task, struct rpc_task *child, rpc_action func)
+void rpc_run_child(struct rpc_task *task, struct rpc_task *child, rpc_action func)
 {
-	spin_lock_bh(&rpc_queue_lock);
+	spin_lock_bh(&childq.lock);
 	/* N.B. Is it possible for the child to have already finished? */
 	__rpc_sleep_on(&childq, task, func, NULL);
 	rpc_schedule_run(child);
-	spin_unlock_bh(&rpc_queue_lock);
+	spin_unlock_bh(&childq.lock);
 }
 
 /*
  * Kill all tasks for the given client.
  * XXX: kill their descendants as well?
  */
-void
-rpc_killall_tasks(struct rpc_clnt *clnt)
+void rpc_killall_tasks(struct rpc_clnt *clnt)
 {
 	struct rpc_task	*rovr;
 	struct list_head *le;
@@ -967,8 +959,7 @@ rpc_killall_tasks(struct rpc_clnt *clnt)
 
 static DECLARE_MUTEX_LOCKED(rpciod_running);
 
-static void
-rpciod_killall(void)
+static void rpciod_killall(void)
 {
 	unsigned long flags;
 

^ permalink raw reply

* Re: 2.6.5, ACPI, suspend and ThinkPad R40
From: Vincent C Jones @ 2004-04-06 13:16 UTC (permalink / raw)
  To: kevin; +Cc: linux-kernel
In-Reply-To: <1HPBr-5yC-5@gated-at.bofh.it>

In article <1HPBr-5yC-5@gated-at.bofh.it> you write:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>>>>>> "Olivier" == Olivier Bornet <Olivier.Bornet@puck.ch> writes:
>
>Olivier> Hi,
>Olivier> On Mon, Apr 05, 2004 at 12:00:09AM +0200, Michal Schmidt wrote:
>>> Yes, see: http://bugzilla.kernel.org/show_bug.cgi?id=1415 There is
>>> a patch which worked for me.
>
>Olivier> Thanks a lot. :-) This patch is working as expected for
>Olivier> me. Now, after doing a:
>
>Olivier>   echo LID > /proc/acpi/wakeup_devices echo SLPB >
>Olivier> /proc/acpi/wakeup_devices
>
>Olivier> I can resume by opening the laptop, or by using the Fn
>Olivier> button, or by using the power button. :-)
>
>Alas, I would like to report it doesn't work for me. 
>
>My laptop suspends, but never comes back from suspend as well. 
>It doesn't seem like the LID or power buttons are a possible setting
>for me. Doing a 'cat /proc/acpi/wakeup_devices' gives:
>
>Device  Speep state     Status
>C04E       5            disabled
>C0A0       3            disabled
>C0A6       3            disabled
>C0A9       3            disabled
>C161       3            disabled
>C162       3            disabled
>C177       4            disabled
>C11E       4            disabled
>
>C11E is my suspend button, but it doesn't seem like it will allow S3? 
>I have no idea what the other addresses are. I tried enabling them
>all, and got a big pile of oopses (which I can duplicate if anyone
>wants)
>
>Any ideas?
>
>Olivier> This patch seems "very" old (first release 2003-10-28).
>
>Olivier> Anyone know why this patch is not in the kernel source tree
>Olivier> at this time ?
>
>Yeah, if it helps some people then it should go in I would think. 
>
>It would be nice if we could just set the list of wakeup devices to a
>sane list for everyone tho. Power/lid/suspend button. 
>
>Back to using nigels swsusp2... at least it's quite fast and the
>latest one seems pretty stable with 2.6 for me at least. ;) 
>
>kevin

Is it my imagination or is there an acute lack of interest in supporting 
notebook features in 2.6.X? Since the early days of 2.5.X, there have
been questions raised regarding suspend/resume and related questions of
critical importance to mobile users. All (at least those associated with
IBM ThinkPads) have been ignored by developers, with the only responses
coming from other notebook users expressing similar concerns.

Is the answer to upgrade to a faster notebook so I can get adequate
performance using the 2.4 kernel in order to retain the ability to
quickly and safely suspend / resume while on the road?

Side note: X23, kernel 2.6.5, SuSE 9.0 with Kraxel fixes. Suspend only
works while on battery, forgetting to unplug the AC first fails every
time and intermittently locks up the box.

-- 
Dr. Vincent C. Jones, PE              Expert advice and a helping hand
Computer Network Consultant           for those who want to manage and
Networking Unlimited, Inc.            control their networking destiny
14 Dogwood Lane, Tenafly, NJ 07670
http://www.networkingunlimited.com
VCJones@NetworkingUnlimited.com  +1 201 568-7810  Fax: +1 201 568-7269 

^ permalink raw reply

* Re: [coLinux-devel] coLinux benchmarks
From: Ian C. Blenke @ 2004-04-06 13:45 UTC (permalink / raw)
  To: Dan Aloni; +Cc: Eyal Lotem, Cooperative Linux Development, Linux Kernel List
In-Reply-To: <20040405222256.GA17572@callisto.yi.org>

On Tue, Apr 06, 2004 at 12:22:56AM +0200, Dan Aloni wrote:
> On Mon, Apr 05, 2004 at 01:11:39PM -0700, Eyal Lotem wrote:
> 
> > I think the reason may be that Windows is using the
> > disks better and making access faster. Perhaps DMA
> > acceleration or some other feature is turned off on
> > the Linux host side, making disk access slower on the
> > Linux side.
> 
> No Windows was involved with these benchmarks in any way. I ran 
> coLinux on Linux.

You ran coLinux on a Linux host? Perhaps I've missed something on the list..
is there a native Linux kernel port now? An alternative to User Mode Linux
is a rather big thing for me.

- Ian C. Blenke <ian@blenke.com>


^ permalink raw reply

* Re: [linux-lvm] Converting the root filesystem to ReiserFS on LVM...
From: David Johnston @ 2004-04-06 13:48 UTC (permalink / raw)
  To: LVM general discussion and development
In-Reply-To: <40717062.6050401@alteeve.com>

On Mon, 2004-04-05 at 10:42, Madison Kelly wrote:
> Hi all,
> 
> I am officially stumped... 
> When I use the FC1 cd I can't unmount '/mnt/sysimage' which is seen 
> as the LVM device '/dev/VG00/LV00', it keeps saying the device or 
> resource is busy
> Under Fedora the devices I need to mount are on:
> /dev/VG00/LV00 = '/' (currently ext3)
> /dev/VG00/LV01 = '/backup' (reiserfs)
> /dev/VG00/LV02 = '/snapshot' (reiserfs)

Madison,
do you by any chance have /mnt/sysimage/backup or /mnt/sysimage/snapshot
mounted when you get the "resource busy" error?
-- 
David Johnston <david@littlebald.com>
Little Bald Consulting, LLC

^ permalink raw reply

* [PATCH] NUMA API for Linux 10/ Bitmap bugfix
From: Andi Kleen @ 2004-04-06 13:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm
In-Reply-To: <20040406153322.5d6e986e.ak@suse.de>


Bugfix to prevent miscompilation on gcc 3.2 of bitmap.h

diff -u linux-2.6.5-numa/include/linux/bitmap.h-o linux-2.6.5-numa/include/linux/bitmap.h
--- linux-2.6.5-numa/include/linux/bitmap.h-o	2004-03-17 12:17:59.000000000 +0100
+++ linux-2.6.5-numa/include/linux/bitmap.h	2004-04-06 13:36:12.000000000 +0200
@@ -29,7 +29,8 @@
 static inline void bitmap_copy(unsigned long *dst,
 			const unsigned long *src, int bits)
 {
-	memcpy(dst, src, BITS_TO_LONGS(bits)*sizeof(unsigned long));
+	int len = BITS_TO_LONGS(bits)*sizeof(unsigned long);
+	memcpy(dst, src, len);
 }
 
 void bitmap_shift_right(unsigned long *dst,

^ permalink raw reply

* [PATCH] NUMA API for Linux 9/ Add simple lazy i386/x86-64 hugetlbfs policy support
From: Andi Kleen @ 2004-04-06 13:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm
In-Reply-To: <20040406153322.5d6e986e.ak@suse.de>

Add NUMA policy support to i386/x86-64 hugetlbfs and switch it 
over to lazy allocation instead of prefaulting.

The NUMA policy support policies the huge page allocation based on the
current policy.

It also switch the hugetlbfs to lazy allocation, because otherwise
mbind() cannot work after mmap, because the memory was already allocated.
This doesn't do any prereservation; when a process runs out of 
huge pages it will get a SIGBUS.

There are currently various proposals on linux-kernel to add preallocation
for this; once one of these patches turns out to be good it would be 
best to replace this patch with it (and port the mpol_* changes over)

diff -u linux-2.6.5-numa/arch/i386/mm/hugetlbpage.c-o linux-2.6.5-numa/arch/i386/mm/hugetlbpage.c
--- linux-2.6.5-numa/arch/i386/mm/hugetlbpage.c-o	2004-04-06 13:11:59.000000000 +0200
+++ linux-2.6.5-numa/arch/i386/mm/hugetlbpage.c	2004-04-06 13:36:12.000000000 +0200
@@ -15,14 +15,17 @@
 #include <linux/module.h>
 #include <linux/err.h>
 #include <linux/sysctl.h>
+#include <linux/mempolicy.h>
 #include <asm/mman.h>
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 
-static long    htlbpagemem;
+/* AK: this should be all moved into the pgdat */
+
+static long    htlbpagemem[MAX_NUMNODES];
 int     htlbpage_max;
-static long    htlbzone_pages;
+static long    htlbzone_pages[MAX_NUMNODES];
 
 static struct list_head hugepage_freelists[MAX_NUMNODES];
 static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
@@ -33,14 +36,15 @@
 		&hugepage_freelists[page_zone(page)->zone_pgdat->node_id]);
 }
 
-static struct page *dequeue_huge_page(void)
+static struct page *dequeue_huge_page(struct vm_area_struct *vma, unsigned long addr)
 {
-	int nid = numa_node_id();
+	int nid = mpol_first_node(vma, addr); 
 	struct page *page = NULL;
 
 	if (list_empty(&hugepage_freelists[nid])) {
 		for (nid = 0; nid < MAX_NUMNODES; ++nid)
-			if (!list_empty(&hugepage_freelists[nid]))
+			if (mpol_node_valid(nid, vma, addr) && 
+			    !list_empty(&hugepage_freelists[nid]))
 				break;
 	}
 	if (nid >= 0 && nid < MAX_NUMNODES && !list_empty(&hugepage_freelists[nid])) {
@@ -61,18 +65,18 @@
 
 static void free_huge_page(struct page *page);
 
-static struct page *alloc_hugetlb_page(void)
+static struct page *alloc_hugetlb_page(struct vm_area_struct *vma, unsigned long addr)
 {
 	int i;
 	struct page *page;
 
 	spin_lock(&htlbpage_lock);
-	page = dequeue_huge_page();
+	page = dequeue_huge_page(vma, addr);
 	if (!page) {
 		spin_unlock(&htlbpage_lock);
 		return NULL;
 	}
-	htlbpagemem--;
+	htlbpagemem[page_zone(page)->zone_pgdat->node_id]--;
 	spin_unlock(&htlbpage_lock);
 	set_page_count(page, 1);
 	page->lru.prev = (void *)free_huge_page;
@@ -284,7 +288,7 @@
 
 	spin_lock(&htlbpage_lock);
 	enqueue_huge_page(page);
-	htlbpagemem++;
+	htlbpagemem[page_zone(page)->zone_pgdat->node_id]++;
 	spin_unlock(&htlbpage_lock);
 }
 
@@ -329,41 +333,49 @@
 	spin_unlock(&mm->page_table_lock);
 }
 
-int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
+/* page_table_lock hold on entry. */
+static int 
+hugetlb_alloc_fault(struct mm_struct *mm, struct vm_area_struct *vma, 
+			       unsigned long addr, int write_access)
 {
-	struct mm_struct *mm = current->mm;
-	unsigned long addr;
-	int ret = 0;
-
-	BUG_ON(vma->vm_start & ~HPAGE_MASK);
-	BUG_ON(vma->vm_end & ~HPAGE_MASK);
-
-	spin_lock(&mm->page_table_lock);
-	for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) {
 		unsigned long idx;
-		pte_t *pte = huge_pte_alloc(mm, addr);
-		struct page *page;
+	int ret;
+	pte_t *pte;
+	struct page *page = NULL;
+	struct address_space *mapping = vma->vm_file->f_mapping;
 
+	pte = huge_pte_alloc(mm, addr); 
 		if (!pte) {
-			ret = -ENOMEM;
+		ret = VM_FAULT_OOM;
 			goto out;
 		}
-		if (!pte_none(*pte))
-			continue;
+
+		/* Handle race */
+		if (!pte_none(*pte)) { 
+			ret = VM_FAULT_MINOR;
+			goto flush; 
+		}
 
 		idx = ((addr - vma->vm_start) >> HPAGE_SHIFT)
 			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 		page = find_get_page(mapping, idx);
 		if (!page) {
-			/* charge the fs quota first */
-			if (hugetlb_get_quota(mapping)) {
-				ret = -ENOMEM;
+		/* Should do this at prefault time, but that gets us into
+		   trouble with freeing right now. */
+		ret = hugetlb_get_quota(mapping);
+		if (ret) {
+			ret = VM_FAULT_OOM;
 				goto out;
 			}
-			page = alloc_hugetlb_page();
+		
+			page = alloc_hugetlb_page(vma, addr);
 			if (!page) {
 				hugetlb_put_quota(mapping);
-				ret = -ENOMEM;
+			
+			/* Instead of OOMing here could just transparently use
+			   small pages. */
+			
+				ret = VM_FAULT_OOM;
 				goto out;
 			}
 			ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC);
@@ -371,23 +383,64 @@
 			if (ret) {
 				hugetlb_put_quota(mapping);
 				free_huge_page(page);
+				ret = VM_FAULT_SIGBUS;
 				goto out;
 			}
-		}
+		ret = VM_FAULT_MAJOR; 
+	} else
+		ret = VM_FAULT_MINOR;
+		
 		set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
-	}
-out:
+
+ flush:
+	/* Don't need to flush other CPUs. They will just do a page
+	   fault and flush it lazily. */
+	__flush_tlb_one(addr);
+	
+ out:
 	spin_unlock(&mm->page_table_lock);
 	return ret;
 }
 
+int arch_hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, 
+		       unsigned long address, int write_access)
+{ 
+	pmd_t *pmd;
+	pgd_t *pgd;
+
+	if (write_access && !(vma->vm_flags & VM_WRITE))
+		return VM_FAULT_SIGBUS;
+
+	spin_lock(&mm->page_table_lock);	
+	pgd = pgd_offset(mm, address); 
+	if (pgd_none(*pgd)) 
+		return hugetlb_alloc_fault(mm, vma, address, write_access); 
+
+	pmd = pmd_offset(pgd, address);
+	if (pmd_none(*pmd))
+		return hugetlb_alloc_fault(mm, vma, address, write_access); 
+
+	BUG_ON(!pmd_large(*pmd)); 
+
+	/* must have been a race. Flush the TLB. NX not supported yet. */ 
+
+	__flush_tlb_one(address); 
+	spin_lock(&mm->page_table_lock);	
+	return VM_FAULT_MINOR;
+} 
+
+int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
+{
+	return 0;
+}
+
 static void update_and_free_page(struct page *page)
 {
 	int j;
 	struct page *map;
 
 	map = page;
-	htlbzone_pages--;
+	htlbzone_pages[page_zone(page)->zone_pgdat->node_id]--;
 	for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
 		map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
 				1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
@@ -404,6 +457,7 @@
 	struct list_head *p;
 	struct page *page, *map;
 
+   page = NULL;
 	map = NULL;
 	spin_lock(&htlbpage_lock);
 	/* all lowmem is on node 0 */
@@ -411,7 +465,7 @@
 		if (map) {
 			list_del(&map->list);
 			update_and_free_page(map);
-			htlbpagemem--;
+ 			htlbpagemem[page_zone(map)->zone_pgdat->node_id]--;
 			map = NULL;
 			if (++count == 0)
 				break;
@@ -423,49 +477,61 @@
 	if (map) {
 		list_del(&map->list);
 		update_and_free_page(map);
-		htlbpagemem--;
+		htlbpagemem[page_zone(page)->zone_pgdat->node_id]--;
 		count++;
 	}
 	spin_unlock(&htlbpage_lock);
 	return count;
 }
 
+static long all_huge_pages(void)
+{ 
+	long pages = 0;
+	int i;
+	for (i = 0; i < numnodes; i++) 
+		pages += htlbzone_pages[i];
+	return pages;
+} 
+
 static int set_hugetlb_mem_size(int count)
 {
 	int lcount;
 	struct page *page;
-
 	if (count < 0)
 		lcount = count;
-	else
-		lcount = count - htlbzone_pages;
+	else { 
+		lcount = count - all_huge_pages();
+	}
 
 	if (lcount == 0)
-		return (int)htlbzone_pages;
+		return (int)all_huge_pages();
 	if (lcount > 0) {	/* Increase the mem size. */
 		while (lcount--) {
+			int node;
 			page = alloc_fresh_huge_page();
 			if (page == NULL)
 				break;
 			spin_lock(&htlbpage_lock);
 			enqueue_huge_page(page);
-			htlbpagemem++;
-			htlbzone_pages++;
+			node = page_zone(page)->zone_pgdat->node_id;
+			htlbpagemem[node]++;
+			htlbzone_pages[node]++;
 			spin_unlock(&htlbpage_lock);
 		}
-		return (int) htlbzone_pages;
+		goto out;
 	}
 	/* Shrink the memory size. */
 	lcount = try_to_free_low(lcount);
 	while (lcount++) {
-		page = alloc_hugetlb_page();
+		page = alloc_hugetlb_page(NULL, 0);
 		if (page == NULL)
 			break;
 		spin_lock(&htlbpage_lock);
 		update_and_free_page(page);
 		spin_unlock(&htlbpage_lock);
 	}
-	return (int) htlbzone_pages;
+ out:
+	return (int)all_huge_pages();
 }
 
 int hugetlb_sysctl_handler(ctl_table *table, int write,
@@ -498,33 +564,60 @@
 		INIT_LIST_HEAD(&hugepage_freelists[i]);
 
 	for (i = 0; i < htlbpage_max; ++i) {
+		int nid; 
 		page = alloc_fresh_huge_page();
 		if (!page)
 			break;
 		spin_lock(&htlbpage_lock);
 		enqueue_huge_page(page);
+		nid = page_zone(page)->zone_pgdat->node_id;
+		htlbpagemem[nid]++;
+		htlbzone_pages[nid]++;
 		spin_unlock(&htlbpage_lock);
 	}
-	htlbpage_max = htlbpagemem = htlbzone_pages = i;
-	printk("Total HugeTLB memory allocated, %ld\n", htlbpagemem);
+	htlbpage_max = i;
+	printk("Initial HugeTLB pages allocated: %d\n", i);
 	return 0;
 }
 module_init(hugetlb_init);
 
 int hugetlb_report_meminfo(char *buf)
 {
+	int i;
+	long pages = 0, mem = 0;
+	for (i = 0; i < numnodes; i++) {
+		pages += htlbzone_pages[i];
+		mem += htlbpagemem[i];
+	}
+
 	return sprintf(buf,
 			"HugePages_Total: %5lu\n"
 			"HugePages_Free:  %5lu\n"
 			"Hugepagesize:    %5lu kB\n",
-			htlbzone_pages,
-			htlbpagemem,
+			pages,
+			mem,
 			HPAGE_SIZE/1024);
 }
 
+int hugetlb_report_node_meminfo(int node, char *buf)
+{
+	return sprintf(buf,
+			"HugePages_Total: %5lu\n"
+			"HugePages_Free:  %5lu\n"
+			"Hugepagesize:    %5lu kB\n",
+			htlbzone_pages[node],
+			htlbpagemem[node],
+			HPAGE_SIZE/1024);
+}
+
+/* Not accurate with policy */
 int is_hugepage_mem_enough(size_t size)
 {
-	return (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpagemem;
+	long pm = 0;
+	int i;
+	for (i = 0; i < numnodes; i++)
+		pm += htlbpagemem[i];
+	return (size + ~HPAGE_MASK)/HPAGE_SIZE <= pm;
 }
 
 /* Return the number pages of memory we physically have, in PAGE_SIZE units. */
diff -u linux-2.6.5-numa/include/linux/mm.h-o linux-2.6.5-numa/include/linux/mm.h
--- linux-2.6.5-numa/include/linux/mm.h-o	2004-04-06 13:12:23.000000000 +0200
+++ linux-2.6.5-numa/include/linux/mm.h	2004-04-06 13:36:12.000000000 +0200
@@ -643,6 +660,9 @@
 extern int remap_page_range(struct vm_area_struct *vma, unsigned long from,
 		unsigned long to, unsigned long size, pgprot_t prot);
 
+extern int arch_hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, 
+			      unsigned long address, int write_access);
+
 #ifndef CONFIG_DEBUG_PAGEALLOC
 static inline void
 kernel_map_pages(struct page *page, int numpages, int enable)
diff -u linux-2.6.5-numa/mm/memory.c-o linux-2.6.5-numa/mm/memory.c
--- linux-2.6.5-numa/mm/memory.c-o	2004-04-06 13:12:24.000000000 +0200
+++ linux-2.6.5-numa/mm/memory.c	2004-04-06 13:36:12.000000000 +0200
@@ -1604,6 +1633,15 @@
 	return VM_FAULT_MINOR;
 }
 
+
+/* Can be overwritten by the architecture */
+int __attribute__((weak)) arch_hugetlb_fault(struct mm_struct *mm, 
+					     struct vm_area_struct *vma, 
+					     unsigned long address, int write_access)
+{
+	return VM_FAULT_SIGBUS;
+}
+
 /*
  * By the time we get here, we already hold the mm semaphore
  */
@@ -1619,7 +1657,7 @@
 	inc_page_state(pgfault);
 
 	if (is_vm_hugetlb_page(vma))
-		return VM_FAULT_SIGBUS;	/* mapping truncation does this. */
+		return arch_hugetlb_fault(mm, vma, address, write_access);
 
 	/*
 	 * We need the page table lock to synchronize with kswapd

^ permalink raw reply

* [PATCH] NUMA API for Linux 8/ Add policy support to anonymous memory
From: Andi Kleen @ 2004-04-06 13:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm
In-Reply-To: <20040406153322.5d6e986e.ak@suse.de>


Change to core VM to use alloc_page_vma() instead of alloc_page().

Change the swap readahead to follow the policy of the VMA.


diff -u linux-2.6.5-numa/include/linux/swap.h-o linux-2.6.5-numa/include/linux/swap.h
--- linux-2.6.5-numa/include/linux/swap.h-o	2004-03-21 21:11:54.000000000 +0100
+++ linux-2.6.5-numa/include/linux/swap.h	2004-04-06 13:36:12.000000000 +0200
@@ -152,7 +152,7 @@
 extern void out_of_memory(void);
 
 /* linux/mm/memory.c */
-extern void swapin_readahead(swp_entry_t);
+extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
 
 /* linux/mm/page_alloc.c */
 extern unsigned long totalram_pages;
@@ -216,7 +216,8 @@
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page * lookup_swap_cache(swp_entry_t);
-extern struct page * read_swap_cache_async(swp_entry_t);
+extern struct page * read_swap_cache_async(swp_entry_t, struct vm_area_struct *vma, 
+					   unsigned long addr);
 
 /* linux/mm/swapfile.c */
 extern int total_swap_pages;
@@ -257,7 +258,7 @@
 #define free_swap_and_cache(swp)		/*NOTHING*/
 #define swap_duplicate(swp)			/*NOTHING*/
 #define swap_free(swp)				/*NOTHING*/
-#define read_swap_cache_async(swp)		NULL
+#define read_swap_cache_async(swp,vma,addr)	NULL
 #define lookup_swap_cache(swp)			NULL
 #define valid_swaphandles(swp, off)		0
 #define can_share_swap_page(p)			0
diff -u linux-2.6.5-numa/mm/memory.c-o linux-2.6.5-numa/mm/memory.c
--- linux-2.6.5-numa/mm/memory.c-o	2004-04-06 13:12:24.000000000 +0200
+++ linux-2.6.5-numa/mm/memory.c	2004-04-06 13:36:12.000000000 +0200
@@ -1056,7 +1056,7 @@
 	pte_chain = pte_chain_alloc(GFP_KERNEL);
 	if (!pte_chain)
 		goto no_pte_chain;
-	new_page = alloc_page(GFP_HIGHUSER);
+	new_page = alloc_page_vma(GFP_HIGHUSER,vma,address);
 	if (!new_page)
 		goto no_new_page;
 	copy_cow_page(old_page,new_page,address);
@@ -1210,9 +1210,17 @@
  * (1 << page_cluster) entries in the swap area. This method is chosen
  * because it doesn't cost us any seek time.  We also make sure to queue
  * the 'original' request together with the readahead ones...  
+ * 
+ * This has been extended to use the NUMA policies from the mm triggering
+ * the readahead.
+ * 
+ * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
  */
-void swapin_readahead(swp_entry_t entry)
+void swapin_readahead(swp_entry_t entry, unsigned long addr,struct vm_area_struct *vma) 
 {
+#ifdef CONFIG_NUMA
+	struct vm_area_struct *next_vma = vma ? vma->vm_next : NULL;
+#endif
 	int i, num;
 	struct page *new_page;
 	unsigned long offset;
@@ -1224,10 +1232,31 @@
 	for (i = 0; i < num; offset++, i++) {
 		/* Ok, do the async read-ahead now */
 		new_page = read_swap_cache_async(swp_entry(swp_type(entry),
-						offset));
+							   offset), vma, addr); 
 		if (!new_page)
 			break;
 		page_cache_release(new_page);
+#ifdef CONFIG_NUMA
+		/* 
+		 * Find the next applicable VMA for the NUMA policy.
+		 */
+		addr += PAGE_SIZE;
+		if (addr == 0) 
+			vma = NULL;
+		if (vma) { 
+			if (addr >= vma->vm_end) { 
+				vma = next_vma;
+				next_vma = vma ? vma->vm_next : NULL;
+			}
+			if (vma && addr < vma->vm_start) 
+				vma = NULL; 
+		} else { 
+			if (next_vma && addr >= next_vma->vm_start) { 
+				vma = next_vma;
+				next_vma = vma->vm_next;
+			}
+		} 
+#endif
 	}
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 }
@@ -1250,8 +1279,8 @@
 	spin_unlock(&mm->page_table_lock);
 	page = lookup_swap_cache(entry);
 	if (!page) {
-		swapin_readahead(entry);
-		page = read_swap_cache_async(entry);
+ 		swapin_readahead(entry, address, vma);
+ 		page = read_swap_cache_async(entry, vma, address);
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte while
@@ -1356,7 +1385,7 @@
 		pte_unmap(page_table);
 		spin_unlock(&mm->page_table_lock);
 
-		page = alloc_page(GFP_HIGHUSER);
+		page = alloc_page_vma(GFP_HIGHUSER,vma,addr);
 		if (!page)
 			goto no_mem;
 		clear_user_highpage(page, addr);
@@ -1448,7 +1477,7 @@
 	 * Should we do an early C-O-W break?
 	 */
 	if (write_access && !(vma->vm_flags & VM_SHARED)) {
-		struct page * page = alloc_page(GFP_HIGHUSER);
+		struct page * page = alloc_page_vma(GFP_HIGHUSER,vma,address);
 		if (!page)
 			goto oom;
 		copy_user_highpage(page, new_page, address);
diff -u linux-2.6.5-numa/mm/swap_state.c-o linux-2.6.5-numa/mm/swap_state.c
--- linux-2.6.5-numa/mm/swap_state.c-o	2004-03-21 21:12:13.000000000 +0100
+++ linux-2.6.5-numa/mm/swap_state.c	2004-04-06 13:36:13.000000000 +0200
@@ -331,7 +331,8 @@
  * A failure return means that either the page allocation failed or that
  * the swap entry is no longer in use.
  */
-struct page * read_swap_cache_async(swp_entry_t entry)
+struct page * 
+read_swap_cache_async(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr)
 {
 	struct page *found_page, *new_page = NULL;
 	int err;
@@ -351,7 +352,7 @@
 		 * Get a new page to read into from swap.
 		 */
 		if (!new_page) {
-			new_page = alloc_page(GFP_HIGHUSER);
+			new_page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
 			if (!new_page)
 				break;		/* Out of memory */
 		}
diff -u linux-2.6.5-numa/mm/swapfile.c-o linux-2.6.5-numa/mm/swapfile.c
--- linux-2.6.5-numa/mm/swapfile.c-o	2004-04-06 13:12:24.000000000 +0200
+++ linux-2.6.5-numa/mm/swapfile.c	2004-04-06 13:36:13.000000000 +0200
@@ -607,7 +607,7 @@
 		 */
 		swap_map = &si->swap_map[i];
 		entry = swp_entry(type, i);
-		page = read_swap_cache_async(entry);
+		page = read_swap_cache_async(entry, NULL, 0);
 		if (!page) {
 			/*
 			 * Either swap_duplicate() failed because entry

^ permalink raw reply

* [PATCH] NUMA API for Linux 7/ Add statistics
From: Andi Kleen @ 2004-04-06 13:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, akpm
In-Reply-To: <20040406153322.5d6e986e.ak@suse.de>

Add NUMA hit/miss statistics to page allocation and display them
in sysfs.

This is not 100% required for NUMA API, but without this it is very
difficult to make sure NUMA API works properly.

The overhead is quite low because all counters are per CPU and only
happens when CONFIG_NUMA is defined.

diff -u linux-2.6.5-numa/include/linux/mmzone.h-o linux-2.6.5-numa/include/linux/mmzone.h
--- linux-2.6.5-numa/include/linux/mmzone.h-o	2004-04-06 13:12:23.000000000 +0200
+++ linux-2.6.5-numa/include/linux/mmzone.h	2004-04-06 13:36:12.000000000 +0200
@@ -52,6 +52,14 @@
 
 struct per_cpu_pageset {
 	struct per_cpu_pages pcp[2];	/* 0: hot.  1: cold */
+#ifdef CONFIG_NUMA
+	unsigned long numa_hit;		/* allocated in intended node */
+	unsigned long numa_miss;	/* allocated in non intended node */
+	unsigned long numa_foreign;	/* was intended here, hit elsewhere */
+	unsigned long interleave_hit; 	/* interleaver prefered this zone */
+	unsigned long local_node;	/* allocation from local node */
+	unsigned long other_node;	/* allocation from other node */
+#endif
 } ____cacheline_aligned_in_smp;
 
 /*
diff -u linux-2.6.5-numa/mm/page_alloc.c-o linux-2.6.5-numa/mm/page_alloc.c
--- linux-2.6.5-numa/mm/page_alloc.c-o	2004-04-06 13:12:24.000000000 +0200
+++ linux-2.6.5-numa/mm/page_alloc.c	2004-04-06 13:49:54.000000000 +0200
@@ -447,6 +447,31 @@
 }
 #endif /* CONFIG_PM */
 
+static void zone_statistics(struct zonelist *zonelist, struct zone *z) 
+{ 
+#ifdef CONFIG_NUMA
+	unsigned long flags;
+	int cpu; 
+	pg_data_t *pg = z->zone_pgdat,
+		*orig = zonelist->zones[0]->zone_pgdat;
+	struct per_cpu_pageset *p;
+	local_irq_save(flags); 
+	cpu = smp_processor_id();
+	p = &z->pageset[cpu];
+	if (pg == orig) {
+		z->pageset[cpu].numa_hit++;
+	} else { 
+		p->numa_miss++;
+		zonelist->zones[0]->pageset[cpu].numa_foreign++;
+	}
+	if (pg == NODE_DATA(numa_node_id()))
+		p->local_node++;
+	else
+		p->other_node++;	
+	local_irq_restore(flags);
+#endif
+} 
+
 /*
  * Free a 0-order page
  */
@@ -582,8 +607,10 @@
 		if (z->free_pages >= min ||
 				(!wait && z->free_pages >= z->pages_high)) {
 			page = buffered_rmqueue(z, order, cold);
-			if (page)
+			if (page) { 
+					zone_statistics(zonelist, z); 
 		       		goto got_pg;
+			}
 		}
 		min += z->pages_low * sysctl_lower_zone_protection;
 	}
@@ -607,8 +634,10 @@
 		if (z->free_pages >= min ||
 				(!wait && z->free_pages >= z->pages_high)) {
 			page = buffered_rmqueue(z, order, cold);
-			if (page)
+			if (page) {
+				zone_statistics(zonelist, z); 
 				goto got_pg;
+			}
 		}
 		min += local_min * sysctl_lower_zone_protection;
 	}
@@ -622,8 +651,10 @@
 			struct zone *z = zones[i];
 
 			page = buffered_rmqueue(z, order, cold);
-			if (page)
+			if (page) {
+				zone_statistics(zonelist, z); 
 				goto got_pg;
+			}
 		}
 		goto nopage;
 	}
@@ -650,8 +681,10 @@
 		if (z->free_pages >= min ||
 				(!wait && z->free_pages >= z->pages_high)) {
 			page = buffered_rmqueue(z, order, cold);
-			if (page)
+			if (page) {
+				zone_statistics(zonelist, z); 
 				goto got_pg;
+			}
 		}
 		min += z->pages_low * sysctl_lower_zone_protection;
 	}
diff -u linux-2.6.5-numa/drivers/base/node.c-o linux-2.6.5-numa/drivers/base/node.c
--- linux-2.6.5-numa/drivers/base/node.c-o	2004-03-17 12:17:46.000000000 +0100
+++ linux-2.6.5-numa/drivers/base/node.c	2004-04-06 13:36:12.000000000 +0200
@@ -30,13 +30,20 @@
 
 static SYSDEV_ATTR(cpumap,S_IRUGO,node_read_cpumap,NULL);
 
+/* Can be overwritten by architecture specific code. */
+int __attribute__((weak)) hugetlb_report_node_meminfo(int node, char *buf)
+{
+	return 0;
+}
+
 #define K(x) ((x) << (PAGE_SHIFT - 10))
 static ssize_t node_read_meminfo(struct sys_device * dev, char * buf)
 {
+	int n;
 	int nid = dev->id;
 	struct sysinfo i;
 	si_meminfo_node(&i, nid);
-	return sprintf(buf, "\n"
+	n = sprintf(buf, "\n"
 		       "Node %d MemTotal:     %8lu kB\n"
 		       "Node %d MemFree:      %8lu kB\n"
 		       "Node %d MemUsed:      %8lu kB\n"
@@ -51,10 +58,52 @@
 		       nid, K(i.freehigh),
 		       nid, K(i.totalram-i.totalhigh),
 		       nid, K(i.freeram-i.freehigh));
+	n += hugetlb_report_node_meminfo(nid, buf + n);
+	return n;
 }
+
 #undef K 
 static SYSDEV_ATTR(meminfo,S_IRUGO,node_read_meminfo,NULL);
 
+static ssize_t node_read_numastat(struct sys_device * dev, char * buf)
+{ 
+	unsigned long numa_hit, numa_miss, interleave_hit, numa_foreign;
+	unsigned long local_node, other_node;
+	int i, cpu;
+	pg_data_t *pg = NODE_DATA(dev->id);
+	numa_hit = 0; 
+	numa_miss = 0; 
+	interleave_hit = 0; 
+	numa_foreign = 0; 
+	local_node = 0;
+	other_node = 0;
+	for (i = 0; i < MAX_NR_ZONES; i++) { 
+		struct zone *z = &pg->node_zones[i]; 
+		for (cpu = 0; cpu < NR_CPUS; cpu++) { 
+			struct per_cpu_pageset *ps = &z->pageset[cpu]; 
+			numa_hit += ps->numa_hit; 
+			numa_miss += ps->numa_miss;
+			numa_foreign += ps->numa_foreign;
+			interleave_hit += ps->interleave_hit;
+			local_node += ps->local_node;
+			other_node += ps->other_node;
+		} 
+	} 
+	return sprintf(buf, 
+		       "numa_hit %lu\n"
+		       "numa_miss %lu\n"
+		       "numa_foreign %lu\n"
+		       "interleave_hit %lu\n"
+		       "local_node %lu\n"
+		       "other_node %lu\n", 
+		       numa_hit,
+		       numa_miss,
+		       numa_foreign,
+		       interleave_hit,
+		       local_node, 
+		       other_node); 
+} 
+static SYSDEV_ATTR(numastat,S_IRUGO,node_read_numastat,NULL);
 
 /*
  * register_node - Setup a driverfs device for a node.
@@ -74,6 +123,7 @@
 	if (!error){
 		sysdev_create_file(&node->sysdev, &attr_cpumap);
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
+		sysdev_create_file(&node->sysdev, &attr_numastat); 
 	}
 	return error;
 }

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.