* Debugging new HW XOR engine driver
@ 2008-07-14 18:26 tirumalareddy marri
2008-07-15 16:35 ` thomas62186218
0 siblings, 1 reply; 9+ messages in thread
From: tirumalareddy marri @ 2008-07-14 18:26 UTC (permalink / raw)
To: linux-raid
I am new to this mailing list. I am working on adding driver for HW accelerated XOR engine. I am able to get XOR engine working and created ADMA interface . I am able to create RAID-5 using mdadm. After creating filesystem, I tried to mount the /dev/md0 . It failied to mount saying that "ext3_check_descriptors: Block bitmap for group 384 not in group" . For sure there is some thing wrong at driver level.
I used "dd" command to read/write to /dev/md0. It seems read/writes are not a problem. When I do small file sizes of 10K , everything fine. When I write a file of size of 1MB or above seems to have issue. Looks like partial write fine too.
I am looking for the ways to debug the problem. Also looking for suggestions from the people who developed HW accelerated drivers and the issue thy run into. Are the some tools to identify the corruptions.
Thanks in Advance.
--Tirumala
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
2008-07-14 18:26 tirumalareddy marri
@ 2008-07-15 16:35 ` thomas62186218
0 siblings, 0 replies; 9+ messages in thread
From: thomas62186218 @ 2008-07-15 16:35 UTC (permalink / raw)
To: tirumalareddymarri, linux-raid
Regarding your HW XOR driver, is this for Intel x86 or some other
platform. I am assuming x86 since you did not mention another other
processor platform but in that case, what hardware are you using for
XOR?
-Thomas
-----Original Message-----
From: tirumalareddy marri <tirumalareddymarri@yahoo.com>
To: linux-raid@vger.kernel.org
Sent: Mon, 14 Jul 2008 11:26 am
Subject: Debugging new HW XOR engine driver
I am new to this mailing list. I am working on adding driver for HW
accelerated
XOR engine. I am able to get XOR engine working and created ADMA
interface . I
am able to create RAID-5 using mdadm. After creating filesystem, I
tried to
mount the /dev/md0 . It failied to mount saying that
"ext3_check_descriptors:
Block bitmap for group 384 not in group" . For sure there is some thing
wrong at
driver level.
I used "dd" command to read/write to /dev/md0. It seems read/writes
are not a
problem. When I do small file sizes of 10K , everything fine. When I
write a
file of size of 1MB or above seems to have issue. Looks like partial
write fine
too.
I am looking for the ways to debug the problem. Also looking for
suggestions
from the people who developed HW accelerated drivers and the issue thy
run into.
Are the some tools to identify the corruptions.
Thanks in Advance.
--Tirumala
--
To unsubscribe from this list: send=2
0the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
@ 2008-07-15 16:59 tirumalareddy marri
2008-07-15 18:34 ` Dan Williams
0 siblings, 1 reply; 9+ messages in thread
From: tirumalareddy marri @ 2008-07-15 16:59 UTC (permalink / raw)
To: thomas62186218, linux-raid
I am working AMCC powerPC based SoC. It has XOR engine built in the HW.
Thanks,
Marri
----- Original Message ----
From: "thomas62186218@aol.com" <thomas62186218@aol.com>
To: tirumalareddymarri@yahoo.com; linux-raid@vger.kernel.org
Sent: Tuesday, July 15, 2008 9:35:15 AM
Subject: Re: Debugging new HW XOR engine driver
Regarding your HW XOR driver, is this for Intel x86 or some other
platform. I am assuming x86 since you did not mention another other
processor platform but in that case, what hardware are you using for
XOR?
-Thomas
-----Original Message-----
From: tirumalareddy marri <tirumalareddymarri@yahoo.com>
To: linux-raid@vger.kernel.org
Sent: Mon, 14 Jul 2008 11:26 am
Subject: Debugging new HW XOR engine driver
I am new to this mailing list. I am working on adding driver for HW
accelerated
XOR engine. I am able to get XOR engine working and created ADMA
interface . I
am able to create RAID-5 using mdadm. After creating filesystem, I
tried to
mount the /dev/md0 . It failied to mount saying that
"ext3_check_descriptors:
Block bitmap for group 384 not in group" . For sure there is some thing
wrong at
driver level.
I used "dd" command to read/write to /dev/md0. It seems read/writes
are not a
problem. When I do small file sizes of 10K , everything fine. When I
write a
file of size of 1MB or above seems to have issue. Looks like partial
write fine
too.
I am looking for the ways to debug the problem. Also looking for
suggestions
from the people who developed HW accelerated drivers and the issue thy
run into.
Are the some tools to identify the corruptions.
Thanks in Advance.
--Tirumala
--
To unsubscribe from this list: send=2
0the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
2008-07-15 16:59 tirumalareddy marri
@ 2008-07-15 18:34 ` Dan Williams
0 siblings, 0 replies; 9+ messages in thread
From: Dan Williams @ 2008-07-15 18:34 UTC (permalink / raw)
To: tirumalareddy marri; +Cc: thomas62186218, linux-raid
On Tue, Jul 15, 2008 at 9:59 AM, tirumalareddy marri
<tirumalareddymarri@yahoo.com> wrote:
> I am working AMCC powerPC based SoC. It has XOR engine built in the HW.
Is this related to the driver started here?
[PATCH] [PPC32] ADMA support for PPC 440SPe processors.
http://marc.info/?l=linux-raid&m=117400143317440&w=2
If so, that may be a better starting point.
--
Dan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
@ 2008-07-15 20:19 tirumalareddy marri
0 siblings, 0 replies; 9+ messages in thread
From: tirumalareddy marri @ 2008-07-15 20:19 UTC (permalink / raw)
To: Dan Williams; +Cc: thomas62186218, linux-raid
Hi Dan,
440SPe was my refence driver . I am working new SOC with some modifications to XOR engines. I have most of the stuff working like ADMA and XOR(tested one block of data). When I use ADMA and SW XOR(with out HW XOR) RAID-5 works fine. When combined ADMA + XOR , having issues. When I create a file system and try to mount it fails. Which could be becasue of some state problem in XOR calculations.
Is there any document about how full stripe write, partial stripe writes and recovery works in MD driver ?
Thanks,
Marri
----- Original Message ----
From: Dan Williams <dan.j.williams@intel.com>
To: tirumalareddy marri <tirumalareddymarri@yahoo.com>
Cc: thomas62186218@aol.com; linux-raid@vger.kernel.org
Sent: Tuesday, July 15, 2008 11:34:48 AM
Subject: Re: Debugging new HW XOR engine driver
On Tue, Jul 15, 2008 at 9:59 AM, tirumalareddy marri
<tirumalareddymarri@yahoo.com> wrote:
> I am working AMCC powerPC based SoC. It has XOR engine built in the HW.
Is this related to the driver started here?
[PATCH] [PPC32] ADMA support for PPC 440SPe processors.
http://marc.info/?l=linux-raid&m=117400143317440&w=2
If so, that may be a better starting point.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
@ 2008-07-15 22:52 tirumalareddy marri
2008-07-16 6:48 ` Dan Williams
0 siblings, 1 reply; 9+ messages in thread
From: tirumalareddy marri @ 2008-07-15 22:52 UTC (permalink / raw)
To: tirumalareddy marri, Dan Williams; +Cc: thomas62186218, linux-raid
I am able to create a disk size of 40MB and mount it(mkfs.ext3 -b 4096 /dev/md0 10000). I was able to copy files to this mounted disk and read them back. If I increased the size more than 40MB file system if failing to mount.
Is it possible that data I have read/write was in page cache and never really written to Hard Disks ? Is it safe to say RAID-5 is partially working ?
Thanks,
Marri
----- Original Message ----
From: tirumalareddy marri <tirumalareddymarri@yahoo.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: thomas62186218@aol.com; linux-raid@vger.kernel.org
Sent: Tuesday, July 15, 2008 1:19:55 PM
Subject: Re: Debugging new HW XOR engine driver
Hi Dan,
440SPe was my refence driver . I am working new SOC with some modifications to XOR engines. I have most of the stuff working like ADMA and XOR(tested one block of data). When I use ADMA and SW XOR(with out HW XOR) RAID-5 works fine. When combined ADMA + XOR , having issues. When I create a file system and try to mount it fails. Which could be becasue of some state problem in XOR calculations.
Is there any document about how full stripe write, partial stripe writes and recovery works in MD driver ?
Thanks,
Marri
----- Original Message ----
From: Dan Williams <dan.j.williams@intel.com>
To: tirumalareddy marri <tirumalareddymarri@yahoo.com>
Cc: thomas62186218@aol.com; linux-raid@vger.kernel.org
Sent: Tuesday, July 15, 2008 11:34:48 AM
Subject: Re: Debugging new HW XOR engine driver
On Tue, Jul 15, 2008 at 9:59 AM, tirumalareddy marri
<tirumalareddymarri@yahoo.com> wrote:
> I am working AMCC powerPC based SoC. It has XOR engine built in the HW.
Is this related to the driver started here?
[PATCH] [PPC32] ADMA support for PPC 440SPe processors.
http://marc.info/?l=linux-raid&m=117400143317440&w=2
If so, that may be a better starting point.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
2008-07-15 22:52 tirumalareddy marri
@ 2008-07-16 6:48 ` Dan Williams
0 siblings, 0 replies; 9+ messages in thread
From: Dan Williams @ 2008-07-16 6:48 UTC (permalink / raw)
To: tirumalareddy marri; +Cc: thomas62186218, linux-raid
On Tue, Jul 15, 2008 at 3:52 PM, tirumalareddy marri
<tirumalareddymarri@yahoo.com> wrote:
> I am able to create a disk size of 40MB and mount it(mkfs.ext3 -b 4096 /dev/md0 10000). I was able to copy files to this mounted disk and read them back. If I increased the size more than 40MB file system if failing to mount.
> Is it possible that data I have read/write was in page cache and never really written to Hard Disks ?
What does the corruption look like? Does it seem to be wrong data or
stale data?
> Is it safe to say RAID-5 is partially working ?
Without more information this sounds like the hw-xor driver is broken.
What kernel version are you developing against? You may want to take
a look at the dmatest client in async_tx/next [1]. It currently only
supports copy tests, but should exercise your driver's descriptor
processing routines. When I tracked down bugs in iop-adma I used
raid5 as the test client and modified the kernel to do data
verification after each calculation in the ops_complete_* routines.
This requires userspace to use a predictable data pattern when writing
to the array.
--
Dan
[1] http://git.kernel.org/?p=linux/kernel/git/djbw/async_tx.git;a=shortlog;h=next
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
@ 2008-07-18 1:37 tirumalareddy marri
0 siblings, 0 replies; 9+ messages in thread
From: tirumalareddy marri @ 2008-07-18 1:37 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-raid
Dan,
Corruption I see is ext3_chec_descriptor: error. Error suggesting to run fsck. When I run fsck it complains that "there is no valid file system". Curroption seems to be happening with only huge data files being written to /dev/md0.
I tried to write one stripe(12k -- with 4disks) and half stripe (6kB). I don't corruption with small files, Every time I write and read it back works fine up to 40MB.
I know it is too much to ask . Do you happened to have sample code you used to debug your driver. I am using xor_blocks() function to compute XOR in the aync_xor() function to compare SW and HW XOR calculations. I am not sure if that is right way to do it. So far I did not see data missmatch.
Thanks and Regards,
Marri
----- Original Message ----
From: Dan Williams <dan.j.williams@intel.com>
To: tirumalareddy marri <tirumalareddymarri@yahoo.com>
Cc: thomas62186218@aol.com; linux-raid@vger.kernel.org
Sent: Tuesday, July 15, 2008 11:48:46 PM
Subject: Re: Debugging new HW XOR engine driver
On Tue, Jul 15, 2008 at 3:52 PM, tirumalareddy marri
<tirumalareddymarri@yahoo.com> wrote:
> I am able to create a disk size of 40MB and mount it(mkfs.ext3 -b 4096 /dev/md0 10000). I was able to copy files to this mounted disk and read them back. If I increased the size more than 40MB file system if failing to mount.
> Is it possible that data I have read/write was in page cache and never really written to Hard Disks ?
What does the corruption look like? Does it seem to be wrong data or
stale data?
> Is it safe to say RAID-5 is partially working ?
Without more information this sounds like the hw-xor driver is broken.
What kernel version are you developing against? You may want to take
a look at the dmatest client in async_tx/next [1]. It currently only
supports copy tests, but should exercise your driver's descriptor
processing routines. When I tracked down bugs in iop-adma I used
raid5 as the test client and modified the kernel to do data
verification after each calculation in the ops_complete_* routines.
This requires userspace to use a predictable data pattern when writing
to the array.
--
Dan
[1] http://git.kernel.org/?p=linux/kernel/git/djbw/async_tx.git;a=shortlog;h=next
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging new HW XOR engine driver
@ 2008-07-18 6:55 tirumalareddy marri
0 siblings, 0 replies; 9+ messages in thread
From: tirumalareddy marri @ 2008-07-18 6:55 UTC (permalink / raw)
To: linux-raid
I modified ops_complete_write() to compare HW XOR calculations with SW XOR . I am not sure if xor_dest is still has the result XOR at this point. But this compare is failing every time. This doesn't seem to be right of checking XOR correctness.
Do you have any inputs on how should I check the XOR result.
static void ops_complete_write(void *stripe_head_ref)
{
struct stripe_head *sh = stripe_head_ref;
struct stripe_queue *sq = sh->sq;
int disks = sq->disks, i;
int pd_idx = sq->pd_idx;
int qd_idx = (sq->raid_conf->level != 6) ? -1 :
raid6_next_disk(pd_idx, disks);
pr_debug("%s: stripe %llu\n", __FUNCTION__,
(unsigned long long)sh->sector);
#if 1 /*marri test start*/
struct page *pg;
char *a,*b;
//int disks = sh->sq->disks;
struct page *xor_srcs[disks];
int target = sh->ops.target;
struct r5dev *tgt = &sh->dev[target];
struct page *xor_dest = tgt->page;
int count = 0;
int dcnt = 0;
int j = 0;
for (j = disks; j--; )
if (j != target)
xor_srcs[count++] = sh->dev[j].page;
pg = alloc_page(GFP_KERNEL);
if(!pg)
goto no_cmp;
a = page_address(pg);
xor_blocks(disks,STRIPE_SIZE,a,(void **)xor_srcs);
b = page_address(xor_dest);
if((memcmp(b,a,PAGE_SIZE) != 0x0))
printk(KERN_ERR"Mem compare fialed at \n");
else {
if(mfdcr(0x61) == 0xfee7) {
for(dcnt = 0; dcnt < PAGE_SIZE; dcnt+=0x4)
printk("HW = 0x%x SW = 0x%x",*(u32 *)(b + dcnt),*(u32 *)(a + dcnt));
}
}
if(pg)
__free_page(pg);
no_cmp:
if(0)
*a = NULL;
#endif /* marri test end */
for (i = disks; i--; ) {
struct r5dev *dev = &sh->dev[i];
struct r5_queue_dev *dev_q = &sq->dev[i];
if (dev_q->written || i == pd_idx || i == qd_idx)
set_bit(R5_UPTODATE, &dev->flags);
}
set_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete);
set_bit(STRIPE_OP_POSTXOR, &sh->ops.complete);
set_bit(STRIPE_HANDLE, &sh->state);
release_stripe(sh);
}
Thanks,
Marri
----- Original Message ----
From: Dan Williams <dan.j.williams@intel.com>
To: tirumalareddy marri <tirumalareddymarri@yahoo.com>
Cc: thomas62186218@aol.com; linux-raid@vger.kernel.org
Sent: Tuesday, July 15, 2008 11:48:46 PM
Subject: Re: Debugging new HW XOR engine driver
On Tue, Jul 15, 2008 at 3:52 PM, tirumalareddy marri
<tirumalareddymarri@yahoo.com> wrote:
> I am able to create a disk size of 40MB and mount it(mkfs.ext3 -b 4096 /dev/md0 10000). I was able to copy files to this mounted disk and read them back. If I increased the size more than 40MB file system if failing to mount.
> Is it possible that data I have read/write was in page cache and never really written to Hard Disks ?
What does the corruption look like? Does it seem to be wrong data or
stale data?
> Is it safe to say RAID-5 is partially working ?
Without more information this sounds like the hw-xor driver is broken.
What kernel version are you developing against? You may want to take
a look at the dmatest client in async_tx/next [1]. It currently only
supports copy tests, but should exercise your driver's descriptor
processing routines. When I tracked down bugs in iop-adma I used
raid5 as the test client and modified the kernel to do data
verification after each calculation in the ops_complete_* routines.
This requires userspace to use a predictable data pattern when writing
to the array.
--
Dan
[1] http://git.kernel.org/?p=linux/kernel/git/djbw/async_tx.git;a=shortlog;h=next
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-18 6:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-15 20:19 Debugging new HW XOR engine driver tirumalareddy marri
-- strict thread matches above, loose matches on Subject: below --
2008-07-18 6:55 tirumalareddy marri
2008-07-18 1:37 tirumalareddy marri
2008-07-15 22:52 tirumalareddy marri
2008-07-16 6:48 ` Dan Williams
2008-07-15 16:59 tirumalareddy marri
2008-07-15 18:34 ` Dan Williams
2008-07-14 18:26 tirumalareddy marri
2008-07-15 16:35 ` thomas62186218
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).