* Re: 2.6.3-rc2-mm1
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
@ 2004-02-12 11:13 ` Andrew Morton
2004-02-12 11:57 ` 2.6.3-rc2-mm1 Anton Blanchard
2004-02-12 11:24 ` 2.6.3-rc2-mm1 Nick Piggin
` (7 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2004-02-12 11:13 UTC (permalink / raw)
To: linux-kernel, linux-mm
Andrew Morton <akpm@osdl.org> wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
This kernel and also 2.6.3-rc1-mm1 have a nasty bug which causes
current->preempt_count to be decremented by one on each hard IRQ. It
manifests as a BUG() in the slab code early in boot.
Disabling CONFIG_DEBUG_SPINLOCK_SLEEP will fix this up. Do not use this
feature on ia32, for it is bust.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1
2004-02-12 11:13 ` 2.6.3-rc2-mm1 Andrew Morton
@ 2004-02-12 11:57 ` Anton Blanchard
2004-02-12 12:09 ` 2.6.3-rc2-mm1 Andrew Morton
0 siblings, 1 reply; 33+ messages in thread
From: Anton Blanchard @ 2004-02-12 11:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
> This kernel and also 2.6.3-rc1-mm1 have a nasty bug which causes
> current->preempt_count to be decremented by one on each hard IRQ. It
> manifests as a BUG() in the slab code early in boot.
>
> Disabling CONFIG_DEBUG_SPINLOCK_SLEEP will fix this up. Do not use this
> feature on ia32, for it is bust.
A few questions spring to mind. Like who wrote that dodgy patch?
And any ideas how said person (who will remain nameless) broke ia32?
Anton
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 11:57 ` 2.6.3-rc2-mm1 Anton Blanchard
@ 2004-02-12 12:09 ` Andrew Morton
2004-02-12 14:40 ` 2.6.3-rc2-mm1 Zwane Mwaikambo
2004-02-12 14:47 ` 2.6.3-rc2-mm1 Anton Blanchard
0 siblings, 2 replies; 33+ messages in thread
From: Andrew Morton @ 2004-02-12 12:09 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-kernel, linux-mm
Anton Blanchard <anton@samba.org> wrote:
>
>
> > This kernel and also 2.6.3-rc1-mm1 have a nasty bug which causes
> > current->preempt_count to be decremented by one on each hard IRQ. It
> > manifests as a BUG() in the slab code early in boot.
> >
> > Disabling CONFIG_DEBUG_SPINLOCK_SLEEP will fix this up. Do not use this
> > feature on ia32, for it is bust.
>
> A few questions spring to mind. Like who wrote that dodgy patch?
The dog wrote my homework?
> And any ideas how said person (who will remain nameless) broke ia32?
Not really. I spent a couple of hours debugging the darn thing, then gave
up and used binary search to find the offending patch.
<looks>
include/asm-i386/hardirq.h:IRQ_EXIT_OFFSET needs treatment, I bet.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 12:09 ` 2.6.3-rc2-mm1 Andrew Morton
@ 2004-02-12 14:40 ` Zwane Mwaikambo
2004-02-12 14:46 ` 2.6.3-rc2-mm1 Anton Blanchard
2004-02-12 14:47 ` 2.6.3-rc2-mm1 Anton Blanchard
1 sibling, 1 reply; 33+ messages in thread
From: Zwane Mwaikambo @ 2004-02-12 14:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: Anton Blanchard, linux-kernel, linux-mm
On Thu, 12 Feb 2004, Andrew Morton wrote:
> Anton Blanchard <anton@samba.org> wrote:
> >
> >
> > > This kernel and also 2.6.3-rc1-mm1 have a nasty bug which causes
> > > current->preempt_count to be decremented by one on each hard IRQ. It
> > > manifests as a BUG() in the slab code early in boot.
> > >
> > > Disabling CONFIG_DEBUG_SPINLOCK_SLEEP will fix this up. Do not use this
> > > feature on ia32, for it is bust.
> >
> > A few questions spring to mind. Like who wrote that dodgy patch?
>
> The dog wrote my homework?
I've not managed to trigger this one
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_SLAB is not set
CONFIG_DEBUG_IOVIRT=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_HIGHMEM=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
> > And any ideas how said person (who will remain nameless) broke ia32?
>
> Not really. I spent a couple of hours debugging the darn thing, then gave
> up and used binary search to find the offending patch.
>
> <looks>
>
> include/asm-i386/hardirq.h:IRQ_EXIT_OFFSET needs treatment, I bet.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 12:09 ` 2.6.3-rc2-mm1 Andrew Morton
2004-02-12 14:40 ` 2.6.3-rc2-mm1 Zwane Mwaikambo
@ 2004-02-12 14:47 ` Anton Blanchard
1 sibling, 0 replies; 33+ messages in thread
From: Anton Blanchard @ 2004-02-12 14:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
> > A few questions spring to mind. Like who wrote that dodgy patch?
> The dog wrote my homework?
> > And any ideas how said person (who will remain nameless) broke ia32?
> Not really. I spent a couple of hours debugging the darn thing, then gave
> up and used binary search to find the offending patch.
Ouch, you'll never get those hours back and you have me to thank for it.
> <looks>
> include/asm-i386/hardirq.h:IRQ_EXIT_OFFSET needs treatment, I bet.
Yep. I wonder why DEBUG_SPINLOCK_SLEEP didnt depend on PREEMPT.
Anton
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
2004-02-12 11:13 ` 2.6.3-rc2-mm1 Andrew Morton
@ 2004-02-12 11:24 ` Nick Piggin
2004-02-12 14:46 ` 2.6.3-rc2-mm1 Zwane Mwaikambo
2004-02-12 15:40 ` 2.6.3-rc2-mm1 Mark Haverkamp
2004-02-12 17:06 ` 2.6.3-rc2-mm1 (compile stats) John Cherry
` (6 subsequent siblings)
8 siblings, 2 replies; 33+ messages in thread
From: Nick Piggin @ 2004-02-12 11:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Christine Moore
Andrew Morton wrote:
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
>
>
Nether this nor the previous one boots on the NUMAQ at osdl.
Not sure which is the last -mm that did. 2.6.3-rc2 boots.
I turned early_printk on and nothing. It stops at
Loading linux..............
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1
2004-02-12 11:24 ` 2.6.3-rc2-mm1 Nick Piggin
@ 2004-02-12 14:46 ` Zwane Mwaikambo
2004-02-12 15:40 ` 2.6.3-rc2-mm1 Mark Haverkamp
1 sibling, 0 replies; 33+ messages in thread
From: Zwane Mwaikambo @ 2004-02-12 14:46 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, Christine Moore
On Thu, 12 Feb 2004, Nick Piggin wrote:
> Andrew Morton wrote:
>
> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
> >
> >
>
> Nether this nor the previous one boots on the NUMAQ at osdl.
> Not sure which is the last -mm that did. 2.6.3-rc2 boots.
>
> I turned early_printk on and nothing. It stops at
> Loading linux..............
Ahh thanks Nick, i tried this last night too and was going to start
working backwards. 2.6.3-rc2 doesn't look like far to work from.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 11:24 ` 2.6.3-rc2-mm1 Nick Piggin
2004-02-12 14:46 ` 2.6.3-rc2-mm1 Zwane Mwaikambo
@ 2004-02-12 15:40 ` Mark Haverkamp
2004-02-12 21:38 ` 2.6.3-rc2-mm1 Andrew Morton
2004-02-12 22:33 ` 2.6.3-rc2-mm1 Nick Piggin
1 sibling, 2 replies; 33+ messages in thread
From: Mark Haverkamp @ 2004-02-12 15:40 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, Christine Moore
On Thu, 2004-02-12 at 03:24, Nick Piggin wrote:
> Andrew Morton wrote:
>
> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
> >
> >
>
> Nether this nor the previous one boots on the NUMAQ at osdl.
> Not sure which is the last -mm that did. 2.6.3-rc2 boots.
>
> I turned early_printk on and nothing. It stops at
> Loading linux..............
I saw this behavior with the last mm kernel on my 8-way with
CONFIG_HIGHMEM64G. The problem went away when I backed out the
highmem-equals-user-friendliness.patch
Mark.
--
Mark Haverkamp <markh@osdl.org>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 15:40 ` 2.6.3-rc2-mm1 Mark Haverkamp
@ 2004-02-12 21:38 ` Andrew Morton
2004-02-12 22:33 ` 2.6.3-rc2-mm1 Nick Piggin
1 sibling, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2004-02-12 21:38 UTC (permalink / raw)
To: Mark Haverkamp; +Cc: piggin, linux-kernel, linux-mm, cem
Mark Haverkamp <markh@osdl.org> wrote:
>
> On Thu, 2004-02-12 at 03:24, Nick Piggin wrote:
> > Andrew Morton wrote:
> >
> > >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
> > >
> > >
> >
> > Nether this nor the previous one boots on the NUMAQ at osdl.
> > Not sure which is the last -mm that did. 2.6.3-rc2 boots.
> >
> > I turned early_printk on and nothing. It stops at
> > Loading linux..............
>
> I saw this behavior with the last mm kernel on my 8-way with
> CONFIG_HIGHMEM64G. The problem went away when I backed out the
> highmem-equals-user-friendliness.patch
Thanks for working that out. James Morris reports that the same patch
causes initrd-with-highmem failures.
Having enough bugs to care for, I guess I'll drop it.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 15:40 ` 2.6.3-rc2-mm1 Mark Haverkamp
2004-02-12 21:38 ` 2.6.3-rc2-mm1 Andrew Morton
@ 2004-02-12 22:33 ` Nick Piggin
1 sibling, 0 replies; 33+ messages in thread
From: Nick Piggin @ 2004-02-12 22:33 UTC (permalink / raw)
To: Mark Haverkamp
Cc: Andrew Morton, linux-kernel, linux-mm, Christine Moore,
Zwane Mwaikambo
Mark Haverkamp wrote:
>On Thu, 2004-02-12 at 03:24, Nick Piggin wrote:
>
>>Andrew Morton wrote:
>>
>>
>>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
>>>
>>>
>>>
>>Nether this nor the previous one boots on the NUMAQ at osdl.
>>Not sure which is the last -mm that did. 2.6.3-rc2 boots.
>>
>>I turned early_printk on and nothing. It stops at
>>Loading linux..............
>>
>
>I saw this behavior with the last mm kernel on my 8-way with
>CONFIG_HIGHMEM64G. The problem went away when I backed out the
>highmem-equals-user-friendliness.patch
>
>
It boots with this patch backed out. Thanks Mark.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1 (compile stats)
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
2004-02-12 11:13 ` 2.6.3-rc2-mm1 Andrew Morton
2004-02-12 11:24 ` 2.6.3-rc2-mm1 Nick Piggin
@ 2004-02-12 17:06 ` John Cherry
2004-02-12 18:43 ` 2.6.3-rc2-mm1 Alistair John Strachan
` (5 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: John Cherry @ 2004-02-12 17:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel@vger.kernel.org, linux-mm
Linux 2.6 (mm tree) Compile Statistics (gcc 3.2.2)
Warnings/Errors Summary
Kernel bzImage bzImage bzImage modules bzImage modules
(defconfig) (allno) (allyes) (allyes) (allmod) (allmod)
--------------- ---------- -------- -------- -------- -------- --------
2.6.3-rc2-mm1 1w/0e 0w/265e 144w/ 5e 7w/0e 3w/0e 145w/0e
2.6.3-rc1-mm1 1w/0e 0w/265e 141w/ 5e 7w/0e 3w/0e 143w/0e
2.6.2-mm1 2w/0e 0w/264e 147w/ 5e 7w/0e 3w/0e 173w/0e
2.6.2-rc3-mm1 2w/0e 0w/265e 146w/ 5e 7w/0e 3w/0e 172w/0e
2.6.2-rc2-mm2 0w/0e 0w/264e 145w/ 5e 7w/0e 3w/0e 171w/0e
2.6.2-rc2-mm1 0w/0e 0w/264e 146w/ 5e 7w/0e 3w/0e 172w/0e
2.6.2-rc1-mm3 0w/0e 0w/265e 144w/ 8e 7w/0e 3w/0e 169w/0e
2.6.2-rc1-mm2 0w/0e 0w/264e 144w/ 5e 10w/0e 3w/0e 171w/0e
2.6.2-rc1-mm1 0w/0e 0w/264e 144w/ 5e 10w/0e 3w/0e 171w/0e
2.6.1-mm5 2w/5e 0w/264e 153w/11e 10w/0e 3w/0e 180w/0e
2.6.1-mm4 0w/821e 0w/264e 154w/ 5e 8w/1e 5w/0e 179w/0e
2.6.1-mm3 0w/0e 0w/0e 151w/ 5e 10w/0e 3w/0e 177w/0e
2.6.1-mm2 0w/0e 0w/0e 143w/ 5e 12w/0e 3w/0e 171w/0e
2.6.1-mm1 0w/0e 0w/0e 146w/ 9e 12w/0e 6w/0e 171w/0e
2.6.1-rc2-mm1 0w/0e 0w/0e 149w/ 0e 12w/0e 6w/0e 171w/4e
2.6.1-rc1-mm2 0w/0e 0w/0e 157w/15e 12w/0e 3w/0e 185w/4e
2.6.1-rc1-mm1 0w/0e 0w/0e 156w/10e 12w/0e 3w/0e 184w/2e
2.6.0-mm2 0w/0e 0w/0e 161w/ 0e 12w/0e 3w/0e 189w/0e
2.6.0-mm1 0w/0e 0w/0e 173w/ 0e 12w/0e 3w/0e 212w/0e
Web page with links to complete details:
http://developer.osdl.org/cherry/compile/
Error Summary (individual module builds):
drivers/net: 0 warnings, 1 errors
Warning Summary (individual module builds):
drivers/block: 1 warnings, 0 errors
drivers/cdrom: 3 warnings, 0 errors
drivers/char: 4 warnings, 0 errors
drivers/ide: 5 warnings, 0 errors
drivers/isdn: 2 warnings, 0 errors
drivers/message: 1 warnings, 0 errors
drivers/mtd: 23 warnings, 0 errors
drivers/net: 7 warnings, 0 errors
drivers/pcmcia: 3 warnings, 0 errors
drivers/scsi/pcmcia: 1 warnings, 0 errors
drivers/scsi: 32 warnings, 0 errors
drivers/serial: 1 warnings, 0 errors
drivers/telephony: 5 warnings, 0 errors
drivers/usb: 2 warnings, 0 errors
drivers/video/aty: 3 warnings, 0 errors
drivers/video/console: 2 warnings, 0 errors
drivers/video/matrox: 5 warnings, 0 errors
drivers/video: 8 warnings, 0 errors
sound/isa: 6 warnings, 0 errors
sound/oss: 33 warnings, 0 errors
John
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
` (2 preceding siblings ...)
2004-02-12 17:06 ` 2.6.3-rc2-mm1 (compile stats) John Cherry
@ 2004-02-12 18:43 ` Alistair John Strachan
2004-02-12 20:33 ` 2.6.3-rc2-mm1 (dm) Miquel van Smoorenburg
` (4 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Alistair John Strachan @ 2004-02-12 18:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
On Thursday 12 February 2004 09:57, you wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6
>.3-rc2-mm1/
>
>
> - Added the big ISDN update
>
> - Device Mapper update
>
[snip]
I don't know if it's still worth reporting these, but at the risk of sounding
like a broken record..
Badness in interruptible_sleep_on at kernel/sched.c:2235
Call Trace:
[<c011b289>] interruptible_sleep_on+0xe9/0x120
[<c011ae80>] default_wake_function+0x0/0x20
[<c0365c17>] copy_block+0xa7/0xe0
[<c036167c>] emu10k1_audio_write+0x1ac/0x320
[<c03614d0>] emu10k1_audio_write+0x0/0x320
[<c015370a>] vfs_write+0x10a/0x150
[<c010f26a>] do_gettimeofday+0x1a/0xb0
[<c0153802>] sys_write+0x42/0x70
[<c03f0266>] sysenter_past_esp+0x43/0x65
Haven't noticed it before.
Other than that, the whole ACPI on nForce2 thing seems to have been fixed.
Back to -mm for me.
--
Cheers,
Alistair.
personal: alistair()devzero!co!uk
university: s0348365()sms!ed!ac!uk
student: CS/AI Undergraduate
contact: 7/10 Darroch Court,
University of Edinburgh.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1 (dm)
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
` (3 preceding siblings ...)
2004-02-12 18:43 ` 2.6.3-rc2-mm1 Alistair John Strachan
@ 2004-02-12 20:33 ` Miquel van Smoorenburg
2004-02-12 21:28 ` Nathan Scott
2004-02-13 0:04 ` 2.6.3-rc2-mm1 Torrey Hoffman
` (3 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Miquel van Smoorenburg @ 2004-02-12 20:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-lvm
According to Andrew Morton:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
> +dm-01-export-dm_vcalloc.patch
> +dm-02-move-to_bytes-to_sectors.patch
> +dm-03-remove-dm_deferred_io.patch
> +dm-04-maintain-bio-ordering.patch
> +dm-05-alloc_dev-error-cleanup.patch
> +dm-07-dm_table_create-GFP-fix.patch
> +dm-08-zero-size-target-fix.patch
> +dm-09-dec_pending-locking-cleanup.patch
> +dm-10-drop-BIO_SEG_VALID.patch
The maintain-bio-ordering patch mostly fixes the performance problem
I was seeing on XFS-over-LVM-on-3ware-raid5. (See my earlier message
at http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0684.html )
Excellent!
However there's still one issue:
I created a LVM volume on /dev/sda2, called /dev/vg0/test. Then
I created and mounted an XFS partition on /dev/vg0/test. XFS uses
a 512 byte blocksize by default, but the underlying /dev/sda2
device had a soft blocksize of 4096 (default after boot is 1024,
but I had been mucking around with it so it got set to 4096).
As a result, I couldn't get more than 35 MB/sec write speed out
of XFS mounted on the LVM device.
I added this little patch:
--- drivers/md/dm-table.c.ORIG 2004-02-12 20:49:47.000000000 +0100
+++ drivers/md/dm-table.c 2004-02-12 20:56:59.000000000 +0100
@@ -361,7 +361,7 @@
blkdev_put(bdev, BDEV_RAW);
else {
d->bdev = bdev;
- set_blocksize(bdev, d->bdev->bd_block_size);
+ set_blocksize(bdev, 512);
}
return r;
}
This forces the underlying device(s) to a soft blocksize of 512. And
I had my 80 MB/sec write speed back !
I'm not sure if setting the blocksize of the underlying device
always to 512 is the right solution. I think that set_blocksize
for dm devices should also set the size for the underlying devices,
but that probably means adding an extra hook so that
set_blocksize can call bdev->bd_disk->fops->set_blocksize(bdev, size).
Which, in the case of dm, would basically call set_blocksize for the
underlying devices again.
Correct ?
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1 (dm)
2004-02-12 20:33 ` 2.6.3-rc2-mm1 (dm) Miquel van Smoorenburg
@ 2004-02-12 21:28 ` Nathan Scott
2004-02-12 22:08 ` Miquel van Smoorenburg
2004-02-12 22:34 ` Andrew Morton
0 siblings, 2 replies; 33+ messages in thread
From: Nathan Scott @ 2004-02-12 21:28 UTC (permalink / raw)
To: Miquel van Smoorenburg; +Cc: Andrew Morton, linux-kernel, linux-lvm
On Thu, Feb 12, 2004 at 09:33:07PM +0100, Miquel van Smoorenburg wrote:
> ...
> However there's still one issue:
>
> I created a LVM volume on /dev/sda2, called /dev/vg0/test. Then
> I created and mounted an XFS partition on /dev/vg0/test. XFS uses
> a 512 byte blocksize by default, but the underlying /dev/sda2
(thats a 512 byte "sector size" in XFS-speak)
> device had a soft blocksize of 4096 (default after boot is 1024,
> but I had been mucking around with it so it got set to 4096).
>
> As a result, I couldn't get more than 35 MB/sec write speed out
> of XFS mounted on the LVM device.
>
> I added this little patch:
>
> --- drivers/md/dm-table.c.ORIG 2004-02-12 20:49:47.000000000 +0100
> +++ drivers/md/dm-table.c 2004-02-12 20:56:59.000000000 +0100
> @@ -361,7 +361,7 @@
> blkdev_put(bdev, BDEV_RAW);
> else {
> d->bdev = bdev;
> - set_blocksize(bdev, d->bdev->bd_block_size);
> + set_blocksize(bdev, 512);
> }
> return r;
> }
>
> This forces the underlying device(s) to a soft blocksize of 512. And
> I had my 80 MB/sec write speed back !
>
> I'm not sure if setting the blocksize of the underlying device
> always to 512 is the right solution. I think that set_blocksize
Hmm... that set_blocksize there must be new in -mm, I don't see
that in mainline yet. I would guess that bdev_hardsect_size()
would be more appropriate here than hard-coding 512 bytes. I
don't know the details of the problem being solving by adding
set_blocksize() in there though, so I might be completely wrong.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1 (dm)
2004-02-12 21:28 ` Nathan Scott
@ 2004-02-12 22:08 ` Miquel van Smoorenburg
2004-02-12 22:34 ` Andrew Morton
1 sibling, 0 replies; 33+ messages in thread
From: Miquel van Smoorenburg @ 2004-02-12 22:08 UTC (permalink / raw)
To: Nathan Scott
Cc: Miquel van Smoorenburg, Andrew Morton, linux-kernel, linux-lvm
On Thu, 12 Feb 2004 22:28:11, Nathan Scott wrote:
> On Thu, Feb 12, 2004 at 09:33:07PM +0100, Miquel van Smoorenburg wrote:
> > ...
> > However there's still one issue:
> >
> > I created a LVM volume on /dev/sda2, called /dev/vg0/test. Then
> > I created and mounted an XFS partition on /dev/vg0/test. XFS uses
> > a 512 byte blocksize by default, but the underlying /dev/sda2
>
> (thats a 512 byte "sector size" in XFS-speak)
Ah yes. mkfs -txfs -s size=4096 fixes it as well, but that's a
workaround.
> > device had a soft blocksize of 4096 (default after boot is 1024,
> > but I had been mucking around with it so it got set to 4096).
> >
> > As a result, I couldn't get more than 35 MB/sec write speed out
> > of XFS mounted on the LVM device.
> >
> > I added this little patch:
> >
> > --- drivers/md/dm-table.c.ORIG 2004-02-12 20:49:47.000000000 +0100
> > +++ drivers/md/dm-table.c 2004-02-12 20:56:59.000000000 +0100
> > @@ -361,7 +361,7 @@
> > blkdev_put(bdev, BDEV_RAW);
> > else {
> > d->bdev = bdev;
> > - set_blocksize(bdev, d->bdev->bd_block_size);
> > + set_blocksize(bdev, 512);
> > }
> > return r;
> > }
> >
> > This forces the underlying device(s) to a soft blocksize of 512. And
> > I had my 80 MB/sec write speed back !
Gah, wrong patch. That's because vim saved dm-table.c when I suspended it
to copy dm-table.c to dm-table.c.ORIG, so the patch makes no sense.
--- drivers/md/dm-table.c.ORIG 2004-02-12 23:05:15.000000000 +0100
+++ drivers/md/dm-table.c 2004-02-12 20:56:59.000000000 +0100
@@ -359,8 +359,10 @@
r = bd_claim(bdev, _claim_ptr);
if (r)
blkdev_put(bdev, BDEV_RAW);
- else
+ else {
d->bdev = bdev;
+ set_blocksize(bdev, 512);
+ }
return r;
}
That's more like it.
> > I'm not sure if setting the blocksize of the underlying device
> > always to 512 is the right solution. I think that set_blocksize
>
> I would guess that bdev_hardsect_size()
> would be more appropriate here than hard-coding 512 bytes. I
> don't know the details of the problem being solving by adding
> set_blocksize() in there though, so I might be completely wrong.
That does make sense, I guess.
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1 (dm)
2004-02-12 21:28 ` Nathan Scott
2004-02-12 22:08 ` Miquel van Smoorenburg
@ 2004-02-12 22:34 ` Andrew Morton
2004-02-13 16:29 ` Miquel van Smoorenburg
1 sibling, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2004-02-12 22:34 UTC (permalink / raw)
To: Nathan Scott; +Cc: miquels, linux-kernel, linux-lvm
Nathan Scott <nathans@sgi.com> wrote:
>
> > This forces the underlying device(s) to a soft blocksize of 512. And
> > I had my 80 MB/sec write speed back !
> >
> > I'm not sure if setting the blocksize of the underlying device
> > always to 512 is the right solution. I think that set_blocksize
>
> Hmm... that set_blocksize there must be new in -mm, I don't see
> that in mainline yet. I would guess that bdev_hardsect_size()
> would be more appropriate here than hard-coding 512 bytes. I
> don't know the details of the problem being solving by adding
> set_blocksize() in there though, so I might be completely wrong.
Yes, 2.6.3-rc2-mm1 has a new device-mapper update.
Miquel, thanks for picking this up. I shall wait for the LVM team to
suggest the preferred fix.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1 (dm)
2004-02-12 22:34 ` Andrew Morton
@ 2004-02-13 16:29 ` Miquel van Smoorenburg
0 siblings, 0 replies; 33+ messages in thread
From: Miquel van Smoorenburg @ 2004-02-13 16:29 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nathan Scott, miquels, linux-kernel, linux-lvm
On 2004.02.12 23:34, Andrew Morton wrote:
> Nathan Scott <nathans@sgi.com> wrote:
> >
> > > This forces the underlying device(s) to a soft blocksize of 512. And
> > > I had my 80 MB/sec write speed back !
> > >
> > > I'm not sure if setting the blocksize of the underlying device
> > > always to 512 is the right solution. I think that set_blocksize
> >
> > Hmm... that set_blocksize there must be new in -mm, I don't see
> > that in mainline yet. I would guess that bdev_hardsect_size()
> > would be more appropriate here than hard-coding 512 bytes. I
> > don't know the details of the problem being solving by adding
> > set_blocksize() in there though, so I might be completely wrong.
>
> Yes, 2.6.3-rc2-mm1 has a new device-mapper update.
>
> Miquel, thanks for picking this up. I shall wait for the LVM team to
> suggest the preferred fix.
Nah, forget about it. I don't think this actually matters, it's still something
with the 3ware driver in RAID5 mode that fsckes this up. It needs more
investigation (it seems there's some heisenberg effect in there, or
I need more sleep).
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
` (4 preceding siblings ...)
2004-02-12 20:33 ` 2.6.3-rc2-mm1 (dm) Miquel van Smoorenburg
@ 2004-02-13 0:04 ` Torrey Hoffman
2004-02-14 10:36 ` 2.6.3-rc2-mm1 Terje Kvernes
2004-02-13 21:04 ` [PATCH 2.6.3-rc2-mm1] Daniel McNeil
` (2 subsequent siblings)
8 siblings, 1 reply; 33+ messages in thread
From: Torrey Hoffman @ 2004-02-13 0:04 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-Kernel List, linux-mm
On Thu, 2004-02-12 at 01:57, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
[... list of many patches]
> bk-ieee1394.patch
I reported a bug in 2.6.2-rc3-mm1 and was asked to retest... result is
it's still broken. The result is the same - even a little worse now, it
won't get as far as running init so I have no log to post. This machine
has no serial port and I haven't tried the network logging stuff yet...
But the oops looked very similar. At least the function names and the
references to ieee1394 are the same. The 2.6.2-rc3-mm1 oops was:
> ieee1394: Host added: ID:BUS[0-00:1023] GUID[00508d0000f42af5]
> Badness in kobject_get at lib/kobject.c:431
> Call Trace:
> [<c02078dc>] kobject_get+0x3c/0x50
> [<c0272fd1>] get_device+0x11/0x20
> [<c0273c68>] bus_for_each_dev+0x78/0xd0
> [<fc876185>] nodemgr_node_probe+0x45/0x100 [ieee1394]
> [<fc876030>] nodemgr_probe_ne_cb+0x0/0x90 [ieee1394]
> [<fc87654b>] nodemgr_host_thread+0x14b/0x180 [ieee1394]
> [<fc876400>] nodemgr_host_thread+0x0/0x180 [ieee1394]
> [<c010b285>] kernel_thread_helper+0x5/0x10
And you (Andrew said)
> "Ben and Greg are currently arguing over whose fault this is ;)"
...
> "There will be a big 1394 update in 2.6.2-mm2. Could you please
retest and let us know?"
--
Torrey Hoffman <thoffman@arnor.net>
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: 2.6.3-rc2-mm1
2004-02-13 0:04 ` 2.6.3-rc2-mm1 Torrey Hoffman
@ 2004-02-14 10:36 ` Terje Kvernes
0 siblings, 0 replies; 33+ messages in thread
From: Terje Kvernes @ 2004-02-14 10:36 UTC (permalink / raw)
To: Torrey Hoffman; +Cc: Andrew Morton, Linux-Kernel List, linux-mm
Torrey Hoffman <thoffman@arnor.net> writes:
> On Thu, 2004-02-12 at 01:57, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
>
> [... list of many patches]
>
> > bk-ieee1394.patch
>
> I reported a bug in 2.6.2-rc3-mm1 and was asked to retest... result
> is it's still broken. The result is the same - even a little worse
> now, it won't get as far as running init so I have no log to post.
I'm seeing the same bug, and I have ieee1394 as a module. I can
help to debug this if need be.
> This machine has no serial port and I haven't tried the network
> logging stuff yet...
>
> But the oops looked very similar. At least the function names and the
> references to ieee1394 are the same. The 2.6.2-rc3-mm1 oops was:
>
> > ieee1394: Host added: ID:BUS[0-00:1023] GUID[00508d0000f42af5]
> > Badness in kobject_get at lib/kobject.c:431
> > Call Trace:
> > [<c02078dc>] kobject_get+0x3c/0x50
> > [<c0272fd1>] get_device+0x11/0x20
> > [<c0273c68>] bus_for_each_dev+0x78/0xd0
> > [<fc876185>] nodemgr_node_probe+0x45/0x100 [ieee1394]
> > [<fc876030>] nodemgr_probe_ne_cb+0x0/0x90 [ieee1394]
> > [<fc87654b>] nodemgr_host_thread+0x14b/0x180 [ieee1394]
> > [<fc876400>] nodemgr_host_thread+0x0/0x180 [ieee1394]
> > [<c010b285>] kernel_thread_helper+0x5/0x10
yup. this is _very_ familiar. I have the same problem with
2.6.3-rc2-mm1. I also have some devfs problems that I can't quite
pinpoint so I've rolled back to 2.6.2-mm1.
[ ... ]
--
Terje
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 2.6.3-rc2-mm1]
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
` (5 preceding siblings ...)
2004-02-13 0:04 ` 2.6.3-rc2-mm1 Torrey Hoffman
@ 2004-02-13 21:04 ` Daniel McNeil
2004-02-13 21:30 ` [PATCH 2.6.3-rc2-mm1] __block_write_full patch Daniel McNeil
2004-02-14 5:27 ` 2.6.3-rc2-mm1 Glenn Johnson
8 siblings, 0 replies; 33+ messages in thread
From: Daniel McNeil @ 2004-02-13 21:04 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel Mailing List, linux-mm, linux-aio@kvack.org
[-- Attachment #1: Type: text/plain, Size: 177 bytes --]
This patch samples i_size before dropping the i_sem.
The i_size could change by a racing write and we could
return uninitialized data.
re-diffed against 2.6.3-rc2-mm1.
Daniel
[-- Attachment #2: dio_size.2.6.3-rc2-mm1.patch --]
[-- Type: text/plain, Size: 1046 bytes --]
--- linux-2.6.3-rc2-mm1.orig/fs/direct-io.c 2004-02-12 11:35:41.613567579 -0800
+++ linux-2.6.3-rc2-mm1/fs/direct-io.c 2004-02-12 11:35:52.135706887 -0800
@@ -909,6 +909,7 @@ direct_io_worker(int rw, struct kiocb *i
int ret = 0;
int ret2;
size_t bytes;
+ loff_t i_size;
dio->bio = NULL;
dio->inode = inode;
@@ -1017,7 +1018,12 @@ direct_io_worker(int rw, struct kiocb *i
* All block lookups have been performed. For READ requests
* we can let i_sem go now that its achieved its purpose
* of protecting us from looking up uninitialized blocks.
+ *
+ * We also need sample i_size before we release i_sem to prevent
+ * a racing write from changing i_size causing us to return
+ * uninitialized data.
*/
+ i_size = i_size_read(inode);
if ((rw == READ) && dio->needs_locking)
up(&dio->inode->i_sem);
@@ -1064,7 +1070,6 @@ direct_io_worker(int rw, struct kiocb *i
if (ret == 0)
ret = dio->page_errors;
if (ret == 0 && dio->result) {
- loff_t i_size = i_size_read(inode);
ret = dio->result;
/*
^ permalink raw reply [flat|nested] 33+ messages in thread* [PATCH 2.6.3-rc2-mm1] __block_write_full patch
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
` (6 preceding siblings ...)
2004-02-13 21:04 ` [PATCH 2.6.3-rc2-mm1] Daniel McNeil
@ 2004-02-13 21:30 ` Daniel McNeil
2004-02-13 21:49 ` [PATCH 2.6.3-rc2-mm1] filemap_fdatawait patch Daniel McNeil
2004-02-13 22:38 ` [PATCH 2.6.3-rc2-mm1] __block_write_full patch Andrew Morton
2004-02-14 5:27 ` 2.6.3-rc2-mm1 Glenn Johnson
8 siblings, 2 replies; 33+ messages in thread
From: Daniel McNeil @ 2004-02-13 21:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]
Andrew,
Here is my original __block_write_full_page patch which adds
a wait_on_buffer() to catch the case where i/o might be in flight
from ll_rw_block().
I know that it waits for i/o even the non-blocking writebacks, but
it does make sure filemap_write_and_wait() and filemap_fdatawrite /
filemap_fdatawait() do wait for all i/o that has been submitted to
complete. (This similar to 2.4 which always does a lock_buffer()).
The direct_read_under test does not see uninitialized data with
this patch. Without this patch 2.6.3-rc2-mm1 still sees uninitialized
data.
I am worried that the current behavior where PageWriteback can be
cleared with i/o still in flight for a buffer on that page could cause
other problems.
I am still trying to figure out how to fix this some other way,
but, in the mean time, this makes sure the code is correct.
Thoughts?
re-diff against 2.6.3-rc2-mm1.
Daniel
On Thu, 2004-02-12 at 01:57, Andrew Morton wrote:
>
> O_DIRECT-ll_rw_block-vs-block_write_full_page-fix.patch
> Fix race between ll_rw_block() and block_write_full_page()
>
[-- Attachment #2: __block_write_full.2.6.3-rc2-mm1.patch --]
[-- Type: text/plain, Size: 370 bytes --]
--- linux-2.6.3-rc2-mm1.orig/fs/buffer.c 2004-02-12 11:43:39.000000000 -0800
+++ linux-2.6.3-rc2-mm1/fs/buffer.c 2004-02-12 11:42:56.000000000 -0800
@@ -1810,6 +1810,7 @@ static int __block_write_full_page(struc
do {
get_bh(bh);
+ wait_on_buffer(bh); /* i/o might be in flight */
if (!buffer_mapped(bh))
continue;
if (wbc->sync_mode != WB_SYNC_NONE) {
^ permalink raw reply [flat|nested] 33+ messages in thread* [PATCH 2.6.3-rc2-mm1] filemap_fdatawait patch
2004-02-13 21:30 ` [PATCH 2.6.3-rc2-mm1] __block_write_full patch Daniel McNeil
@ 2004-02-13 21:49 ` Daniel McNeil
2004-02-13 22:38 ` [PATCH 2.6.3-rc2-mm1] __block_write_full patch Andrew Morton
1 sibling, 0 replies; 33+ messages in thread
From: Daniel McNeil @ 2004-02-13 21:49 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 635 bytes --]
Andrew,
This is re-diffed against 2.6.3-rc2-mm1.
This adds the lock_page() and check for PageDirty() in
filemap_fdatawait().
I did additional direct_read_under testing with a forked() process
doing "sync()" and without this patch I hit uninitialized data
even with the __block_write_full_page() patch.
Thoughts?
Daniel
PS
I still think there is potential problem if we ever allow multiple
filemap_fdatawait() to occur in parallel since filemap_fdatawait()
waits for a page with the page unlinked from the locked list.
The 2nd filemap_fdatawait() would never see that page to wait.
I might take care of that in a future patch.
[-- Attachment #2: filemap_fdatawait.2.6.3-rc2-mm1.patch --]
[-- Type: text/plain, Size: 609 bytes --]
--- linux-2.6.3-rc2-mm1.orig/mm/filemap.c 2004-02-12 15:04:41.000000000 -0800
+++ linux-2.6.3-rc2-mm1/mm/filemap.c 2004-02-13 12:57:08.777799758 -0800
@@ -209,10 +209,15 @@ restart:
page_cache_get(page);
spin_unlock(&mapping->page_lock);
- wait_on_page_writeback(page);
- if (PageError(page))
- ret = -EIO;
-
+ lock_page(page);
+ if (PageDirty(page) && mapping->a_ops->writepage) {
+ write_one_page(page, 1);
+ } else {
+ wait_on_page_writeback(page);
+ unlock_page(page);
+ }
+ if (PageError(page))
+ ret = -EIO;
page_cache_release(page);
spin_lock(&mapping->page_lock);
}
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2.6.3-rc2-mm1] __block_write_full patch
2004-02-13 21:30 ` [PATCH 2.6.3-rc2-mm1] __block_write_full patch Daniel McNeil
2004-02-13 21:49 ` [PATCH 2.6.3-rc2-mm1] filemap_fdatawait patch Daniel McNeil
@ 2004-02-13 22:38 ` Andrew Morton
2004-02-13 23:30 ` Daniel McNeil
1 sibling, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2004-02-13 22:38 UTC (permalink / raw)
To: Daniel McNeil; +Cc: linux-aio, linux-kernel
Daniel McNeil <daniel@osdl.org> wrote:
>
> Here is my original __block_write_full_page patch which adds
> a wait_on_buffer() to catch the case where i/o might be in flight
> from ll_rw_block().
We don't want to be doing this.
Also, I don't buy the original rationale for the patch. Sure,
__block_write_full_page() clears PG_writeback. But that's OK because that
function was also responsible for setting it, and everything is under the
page lock anyway.
My suspicion is that the real problem is that mpage_writepages() moved the
page onto mapping->locked_pages while there is buffer-level I/O in flight
(that's OK). But PG_writeback is not set because writepage never started
any I/O. So filemap_fdatawait() never waits for the ext3-initiated
buffer-level I/O.
If so, there are several ways to fix this:
a) Change ext3 so that it appropriately sets and clears page_writeback
when any of the page's buffers are under writeout (messy). Or
b) Change filemap_fdatawait() so that it also waits on buffer-level I/O.
This is tricky because filemap_fdatawait() isn't allowed to assume
that page->private points at buffer_heads. Only the address_space
implementation knows what is at page->private. So it will need to be
something like:
lock_page(page);
wait_on_page_writeback(page);
mapping = page->mapping;
if (mapping) {
if (mapping->aops->wait_on_private_writeback)
mapping->aops->wait_on_private_writeback(page);
}
unlock_page(page);
ext3_wait_on_private_writeback(struct page *page)
{
for (the buffers)
wait_on_buffer()
}
or
c) Change __block_write_full_page() to move the page back onto
mapping->dirty_pages if it was WB_SYNC_NONE and we discovered that the
page had a locked buffer. This way, a subsequent WB_SYNC_ALL will
correctly wait on that buffer.
Try c), please?
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2.6.3-rc2-mm1] __block_write_full patch
2004-02-13 22:38 ` [PATCH 2.6.3-rc2-mm1] __block_write_full patch Andrew Morton
@ 2004-02-13 23:30 ` Daniel McNeil
2004-02-13 23:48 ` Andrew Morton
0 siblings, 1 reply; 33+ messages in thread
From: Daniel McNeil @ 2004-02-13 23:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List
On Fri, 2004-02-13 at 14:38, Andrew Morton wrote:
> My suspicion is that the real problem is that mpage_writepages() moved the
> page onto mapping->locked_pages while there is buffer-level I/O in flight
> (that's OK). But PG_writeback is not set because writepage never started
> any I/O. So filemap_fdatawait() never waits for the ext3-initiated
> buffer-level I/O.
kjournald was calling ll_rw_block() with a bunch of bh's and
PG_writeback was not set.
A subsequent __block_write_full_page() with the
page locked, sees that none of the bh(s) for that page
are dirty. PG_PageWriteback would be set, unlock_page()
and then see no buffers to submit, so clear PG_Page_Writeback
(but buffer i/o still in flight). filemap_fdatawait() has
nothing to wait for.
I think we agree.
>
> If so, there are several ways to fix this:
> c) Change __block_write_full_page() to move the page back onto
> mapping->dirty_pages if it was WB_SYNC_NONE and we discovered that the
> page had a locked buffer. This way, a subsequent WB_SYNC_ALL will
> correctly wait on that buffer.
>
> Try c), please?
No problem. I will code this up and give it a try.
My only concern is that a racing mpage_writepages(WB_SYNC_NONE)
with a mpage_write_pages(WB_SYNC_ALL) from a filemap_write_and_wait.
Both could be processing the io_pages list, if the
mpage_writepages(WB_SYNC_NONE) moves a page that has locked buffers
back to the dirty_pages list, then when the filemap_write_and_wait()
calls filemap_fdatawait, it will not wait for the page moved back
to the dirty list.
I'll code up the change and run my tests and let you know what happens.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2.6.3-rc2-mm1] __block_write_full patch
2004-02-13 23:30 ` Daniel McNeil
@ 2004-02-13 23:48 ` Andrew Morton
2004-02-14 0:02 ` Daniel McNeil
2004-02-18 1:02 ` [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch Daniel McNeil
0 siblings, 2 replies; 33+ messages in thread
From: Andrew Morton @ 2004-02-13 23:48 UTC (permalink / raw)
To: Daniel McNeil; +Cc: linux-aio, linux-kernel
Daniel McNeil <daniel@osdl.org> wrote:
>
> My only concern is that a racing mpage_writepages(WB_SYNC_NONE)
> with a mpage_write_pages(WB_SYNC_ALL) from a filemap_write_and_wait.
> Both could be processing the io_pages list, if the
> mpage_writepages(WB_SYNC_NONE) moves a page that has locked buffers
> back to the dirty_pages list, then when the filemap_write_and_wait()
> calls filemap_fdatawait, it will not wait for the page moved back
> to the dirty list.
Yes. I suspect we simply cannot get this right without insane locking.
We're trying to do something here which the writeback code simply does not
and cannot generally do, namely write and wait upon IO and dirtyings which
are initiated by other processes.
The best way to handle *all* this crap is to remove the address_space page
lists completely and replace all these things with radix tree walks, but I
never got onto that. Sad.
Maybe we could implement some form of per-address_space serialisation which
permts multiple WB_SYNC_NONE writers, but exclusive WB_SYNC_ALL writers.
That's basically an rwsem, but we don't want to block WB_SYNC_NONE
processes if there's a sync in progress.
So WB_SYNC_NONE callers would use down_read_trylock() and WB_SYNC_ALL
callers would use down_write(). That just fixes all this stuff up.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2.6.3-rc2-mm1] __block_write_full patch
2004-02-13 23:48 ` Andrew Morton
@ 2004-02-14 0:02 ` Daniel McNeil
2004-02-18 1:02 ` [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch Daniel McNeil
1 sibling, 0 replies; 33+ messages in thread
From: Daniel McNeil @ 2004-02-14 0:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List
On Fri, 2004-02-13 at 15:48, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > My only concern is that a racing mpage_writepages(WB_SYNC_NONE)
> > with a mpage_write_pages(WB_SYNC_ALL) from a filemap_write_and_wait.
> > Both could be processing the io_pages list, if the
> > mpage_writepages(WB_SYNC_NONE) moves a page that has locked buffers
> > back to the dirty_pages list, then when the filemap_write_and_wait()
> > calls filemap_fdatawait, it will not wait for the page moved back
> > to the dirty list.
>
> Yes. I suspect we simply cannot get this right without insane locking.
> We're trying to do something here which the writeback code simply does not
> and cannot generally do, namely write and wait upon IO and dirtyings which
> are initiated by other processes.
>
> The best way to handle *all* this crap is to remove the address_space page
> lists completely and replace all these things with radix tree walks, but I
> never got onto that. Sad.
>
> Maybe we could implement some form of per-address_space serialisation which
> permts multiple WB_SYNC_NONE writers, but exclusive WB_SYNC_ALL writers.
> That's basically an rwsem, but we don't want to block WB_SYNC_NONE
> processes if there's a sync in progress.
>
> So WB_SYNC_NONE callers would use down_read_trylock() and WB_SYNC_ALL
> callers would use down_write(). That just fixes all this stuff up.
>
This sounds like an interesting idea. I'll take a look and see if
I can give it a try.
BTW, the 2.6.3-rc2-mm1 __block_write_full_page() almost already did
your option c) except for the "if (buffer_dirty())". If the
buffer is in flight buffer_dirty would already be cleared, so
it would not call __set_page_dirty_nobuffers().
if (wbc->sync_mode != WB_SYNC_NONE) {
lock_buffer(bh);
} else {
if (test_set_buffer_locked(bh)) {
if (buffer_dirty(bh))
__set_page_dirty_nobuffers(page);
continue;
}
}
Thanks,
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread* [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch
2004-02-13 23:48 ` Andrew Morton
2004-02-14 0:02 ` Daniel McNeil
@ 2004-02-18 1:02 ` Daniel McNeil
2004-02-18 1:43 ` Andrew Morton
2004-02-18 1:47 ` Andrew Morton
1 sibling, 2 replies; 33+ messages in thread
From: Daniel McNeil @ 2004-02-18 1:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 2101 bytes --]
Andrew,
Here is the patch that does what you suggested. It adds a rwsema to
the address_space and do_writepages() uses it serialize writebacks.
It allows multiple WB_SYNC_NONE and only 1 WB_SYNC_ALL writebacks.
It also change __block_write_full_page() to not check buffer_dirty()
before calling __set_page_dirty_nobuffers(page) since a locked buffer
might already have dirty_buffer cleared.
This patch also applies to 2.6.3-rc3-mm1.
I've tested this patch on an 8-proc machine running 6 copies of the
direct_read_under tests and have not hit any uninitialized data.
It has been running almost 2 hours -- I usually hit a problem within
30 minutes. I'll leave the test running overnight.
Daniel
On Fri, 2004-02-13 at 15:48, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > My only concern is that a racing mpage_writepages(WB_SYNC_NONE)
> > with a mpage_write_pages(WB_SYNC_ALL) from a filemap_write_and_wait.
> > Both could be processing the io_pages list, if the
> > mpage_writepages(WB_SYNC_NONE) moves a page that has locked buffers
> > back to the dirty_pages list, then when the filemap_write_and_wait()
> > calls filemap_fdatawait, it will not wait for the page moved back
> > to the dirty list.
>
> Yes. I suspect we simply cannot get this right without insane locking.
> We're trying to do something here which the writeback code simply does not
> and cannot generally do, namely write and wait upon IO and dirtyings which
> are initiated by other processes.
>
> The best way to handle *all* this crap is to remove the address_space page
> lists completely and replace all these things with radix tree walks, but I
> never got onto that. Sad.
>
> Maybe we could implement some form of per-address_space serialisation which
> permts multiple WB_SYNC_NONE writers, but exclusive WB_SYNC_ALL writers.
> That's basically an rwsem, but we don't want to block WB_SYNC_NONE
> processes if there's a sync in progress.
>
> So WB_SYNC_NONE callers would use down_read_trylock() and WB_SYNC_ALL
> callers would use down_write(). That just fixes all this stuff up.
[-- Attachment #2: address_space_serialize_writeback.2.6.3-rc2-mm1.patch --]
[-- Type: text/x-patch, Size: 3242 bytes --]
diff -rup linux-2.6.3-rc2-mm1.orig/fs/buffer.c linux-2.6.3-rc2-mm1/fs/buffer.c
--- linux-2.6.3-rc2-mm1.orig/fs/buffer.c 2004-02-12 11:43:39.000000000 -0800
+++ linux-2.6.3-rc2-mm1/fs/buffer.c 2004-02-17 08:57:18.396196425 -0800
@@ -1816,8 +1816,7 @@ static int __block_write_full_page(struc
lock_buffer(bh);
} else {
if (test_set_buffer_locked(bh)) {
- if (buffer_dirty(bh))
- __set_page_dirty_nobuffers(page);
+ __set_page_dirty_nobuffers(page);
continue;
}
}
diff -rup linux-2.6.3-rc2-mm1.orig/fs/inode.c linux-2.6.3-rc2-mm1/fs/inode.c
--- linux-2.6.3-rc2-mm1.orig/fs/inode.c 2004-02-16 17:32:18.000000000 -0800
+++ linux-2.6.3-rc2-mm1/fs/inode.c 2004-02-16 17:39:29.000000000 -0800
@@ -195,6 +195,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(&inode->i_data.i_mmap_shared);
spin_lock_init(&inode->i_lock);
i_size_ordered_init(inode);
+ init_rwsem(&inode->i_data.wb_rwsema);
}
EXPORT_SYMBOL(inode_init_once);
diff -rup linux-2.6.3-rc2-mm1.orig/include/linux/fs.h linux-2.6.3-rc2-mm1/include/linux/fs.h
--- linux-2.6.3-rc2-mm1.orig/include/linux/fs.h 2004-02-16 17:01:17.000000000 -0800
+++ linux-2.6.3-rc2-mm1/include/linux/fs.h 2004-02-16 17:08:13.000000000 -0800
@@ -338,6 +338,7 @@ struct address_space {
spinlock_t private_lock; /* for use by the address_space */
struct list_head private_list; /* ditto */
struct address_space *assoc_mapping; /* ditto */
+ struct rw_semaphore wb_rwsema; /* serialize SYNC writebacks */
};
struct block_device {
diff -rup linux-2.6.3-rc2-mm1.orig/mm/page-writeback.c linux-2.6.3-rc2-mm1/mm/page-writeback.c
--- linux-2.6.3-rc2-mm1.orig/mm/page-writeback.c 2004-02-16 17:03:26.000000000 -0800
+++ linux-2.6.3-rc2-mm1/mm/page-writeback.c 2004-02-17 15:02:11.004475189 -0800
@@ -497,9 +497,32 @@ void __init page_writeback_init(void)
int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
{
+ int ret;
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (!down_read_trylock(&mapping->wb_rwsema))
+ /*
+ * SYNC writeback in progress
+ */
+ return 0;
+ } else {
+ /*
+ * Only allow 1 SYNC writeback at a time, to be able
+ * to wait for all i/o without worrying about racing
+ * WB_SYNC_NONE writers.
+ */
+ down_write(&mapping->wb_rwsema);
+ }
+
if (mapping->a_ops->writepages)
- return mapping->a_ops->writepages(mapping, wbc);
- return generic_writepages(mapping, wbc);
+ ret = mapping->a_ops->writepages(mapping, wbc);
+ else
+ ret = generic_writepages(mapping, wbc);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ up_read(&mapping->wb_rwsema);
+ } else {
+ up_write(&mapping->wb_rwsema);
+ }
+ return ret;
}
/**
diff -rup linux-2.6.3-rc2-mm1.orig/mm/swap_state.c linux-2.6.3-rc2-mm1/mm/swap_state.c
--- linux-2.6.3-rc2-mm1.orig/mm/swap_state.c 2004-02-16 17:31:57.000000000 -0800
+++ linux-2.6.3-rc2-mm1/mm/swap_state.c 2004-02-17 09:00:54.881899941 -0800
@@ -38,6 +38,7 @@ struct address_space swapper_space = {
.truncate_count = ATOMIC_INIT(0),
.private_lock = SPIN_LOCK_UNLOCKED,
.private_list = LIST_HEAD_INIT(swapper_space.private_list),
+ .wb_rwsema = __RWSEM_INITIALIZER(swapper_space.wb_rwsema)
};
#define INC_CACHE_INFO(x) do { swap_cache_info.x++; } while (0)
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch
2004-02-18 1:02 ` [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch Daniel McNeil
@ 2004-02-18 1:43 ` Andrew Morton
2004-02-18 1:47 ` Andrew Morton
1 sibling, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2004-02-18 1:43 UTC (permalink / raw)
To: Daniel McNeil; +Cc: linux-aio, linux-kernel
Daniel McNeil <daniel@osdl.org> wrote:
>
> Here is the patch that does what you suggested. It adds a rwsema to
> the address_space and do_writepages() uses it serialize writebacks.
OK, but we're only holding the rwsem across filemap_fdatawrite(). What
happens after we've dropped the rwsem and we are running
filemap_fdatawait()? Cannot kupdate come in and start moving pages onto
the wrong address_space lists while filemap_fdatawait() is trying to wait
on them?
I think so. Possibly your test just doesn't cover this case.
If so then we need to hold the rwsem for writing across the entire
write-and-wait. And that is going to rather suck if and when we bring back
the sync_page_range() patch, which permitted concurrent fsync() against
different fd's which cover different parts of the file.
We need to check that we're bypassing all this stuff for access to
blockdevs too - we have no security issues to worry about there.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch
2004-02-18 1:02 ` [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch Daniel McNeil
2004-02-18 1:43 ` Andrew Morton
@ 2004-02-18 1:47 ` Andrew Morton
2004-02-18 19:36 ` Daniel McNeil
1 sibling, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2004-02-18 1:47 UTC (permalink / raw)
To: Daniel McNeil; +Cc: linux-aio, linux-kernel
Daniel McNeil <daniel@osdl.org> wrote:
>
> Here is the patch that does what you suggested. It adds a rwsema to
> the address_space and do_writepages() uses it serialize writebacks.
Did you verify that we actually _need_ the semaphore? I seem to recall that
it was a "try this, otherwise add the semaphore" thing. Where "this" was "always
remark the page dirty".
Probably we do need the semaphore, but I'd just like to check that you checked ;)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2.6.3-rc2-mm1] address_space_serialize_writeback patch
2004-02-18 1:47 ` Andrew Morton
@ 2004-02-18 19:36 ` Daniel McNeil
0 siblings, 0 replies; 33+ messages in thread
From: Daniel McNeil @ 2004-02-18 19:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List
Andrew,
I did run the multiple copies of the direct_read_under test on:
KERNEL RESULT
====== ======
2.6.3-rc2-mm1 Sees uninitialized data
2.6.3-rc2-mm1 + wait_on_buffer() in
__block_write_full_page no uninitialized data seen
2.6.3-rc2-mm1 + __set_page_dirty_nobuffers
if cannot lock_buffer in
__block_write_full_page Sees uninitialized data
2.6.3-rc2-mm1 + wb_rwsema patch no uninitialized data seen
There looks like there are potential race conditions with
the rwsema in do_writepages() since a page can be moved back
to the dirty list while waiting on the down_write() and also
with multiple filemap_fdatawait()s. I have pointed this out before
but I do not have a test that can reproduce the problem.
The rwsema patch does make the direct_read_under test pass.
It ran overnight without errors.
I want to close all the potential race conditions, but since
this patch is a step in the process and the test ran correctly
with it, I submitted the patch.
I'll send out updated patches as I do more testing.
Daniel
On Tue, 2004-02-17 at 17:47, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > Here is the patch that does what you suggested. It adds a rwsema to
> > the address_space and do_writepages() uses it serialize writebacks.
>
> Did you verify that we actually _need_ the semaphore? I seem to recall that
> it was a "try this, otherwise add the semaphore" thing. Where "this" was "always
> remark the page dirty".
>
> Probably we do need the semaphore, but I'd just like to check that you checked ;)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 2.6.3-rc2-mm1
2004-02-12 9:57 2.6.3-rc2-mm1 Andrew Morton
` (7 preceding siblings ...)
2004-02-13 21:30 ` [PATCH 2.6.3-rc2-mm1] __block_write_full patch Daniel McNeil
@ 2004-02-14 5:27 ` Glenn Johnson
8 siblings, 0 replies; 33+ messages in thread
From: Glenn Johnson @ 2004-02-14 5:27 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
On Thu, Feb 12, 2004 at 01:57:10AM -0800, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/
> +sysfs-class-10-vc.patch
>
> Bring back this patch, see if it triggers the tty race again.
It does on one of my machines, a P4c with HT enabled. This is the same
machine that had the problem before. Backing out the patch "fixes" it.
--
Glenn Johnson
glennpj@charter.net
^ permalink raw reply [flat|nested] 33+ messages in thread