* Re: MD Remnants After --stop
From: NeilBrown @ 2016-12-01 22:35 UTC (permalink / raw)
To: Marc Smith; +Cc: linux-raid
In-Reply-To: <CAHkw+LfmEyY8CSumeeftOrcYKzbkumHwtfeTuT0DhHZbaxdUSw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4565 bytes --]
On Fri, Dec 02 2016, Marc Smith wrote:
> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>> On Mon, Nov 28 2016, Marc Smith wrote:
>>
>>>
>>> # find /sys/block/md127/md
>>> /sys/block/md127/md
>>> /sys/block/md127/md/reshape_position
>>> /sys/block/md127/md/layout
>>> /sys/block/md127/md/raid_disks
>>> /sys/block/md127/md/bitmap
>>> /sys/block/md127/md/bitmap/chunksize
>>
>> This tells me that:
>> sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>> hasn't been run, so mddev_delayed_delete() hasn't run.
>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>
>> Everything else suggests that array has been stopped and cleaned and
>> should be gone...
>>
>> This seems to suggest that there is an unbalanced mddev_get() without a
>> matching mddev_put(). I cannot find it though.
>>
>> If I could reproduce it, I would try to see what is happening by:
>>
>> - putting
>> printk("mddev->active = %d\n", atomic_read(&mddev->active));
>> in the top of mddev_put(). That shouldn't be *too* noisy.
>>
>> - putting
>> printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>> list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>
>> in mddev_put() just before those values are tested.
>>
>> - putting
>> printk("queue_work\n");
>> just before the 'queue_work()' call in mddev_put.
>>
>> - putting
>> printk("mddev_delayed_delete\n");
>> in mddev_delayed_delete()
>>
>> Then see what gets printed when you stop the array.
>
> I made those modifications to md.c and here is the kernel log when stopping:
>
> --snip--
> [ 3937.233487] mddev->active = 2
> [ 3937.233503] mddev->active = 2
> [ 3937.233509] mddev->active = 2
> [ 3937.233516] mddev->active = 1
> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
At this point, mdadm has opened the /dev/md127 device, accessed a few
attributes via sysfs just to check on the status, and then closed it
again.
The array is still active, but we know that no other process has it
open.
> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
> [ 3937.241493] md: md127 stopped.
Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
Still nothing else has the array open.
> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
> [ 3937.241989] udevd[4991]: seq 3631 running
> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
> lockspace group...
> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
> [ 3937.242068] mddev->active = 3
But somehow the ->active count got up to 3.
mdadm probably still has it open, but two other things do too.
If you have "mdadm --monitor" running in the background (which is good)
it will temporarily increase, then decrease the count.
udevd opens the device temporarily too.
So this isn't necessarily a problem.
> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
> [ 3937.242080] mddev->active = 3
> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
> /usr/lib/udev/rules.d/69-bcache.rules:16
> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
> release_lockspace final free
> [ 3937.242861] md: unbind<dm-1>
> [ 3937.256606] md: export_rdev(dm-1)
> [ 3937.256612] md: unbind<dm-0>
> [ 3937.263601] md: export_rdev(dm-0)
> [ 3937.263688] mddev->active = 4
> [ 3937.263751] mddev->active = 3
But here, the active count only drops down to 2. (it is decremented
after it is printed). Assuming there really were no more messages like
this, there are two active references to the md device, and we don't
know what they are.
>
> I didn't use my modified mdadm which stops the synthesized CHANGE from
> occurring, but if needed, I can re-run the test using that.
It would be good to use the modified mdadm, if only to reduce the
noise. It won't change the end result, but might make it easier to see
what is happening.
Also please add
WARN_ON(1);
in the start of mddev_get() and mddev_put().
That will provide a stack trace whenever either of these are called, so
we can see who takes a references, and who doesn't release it.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply
* Re: [PATCH 1/2] md/raid1: Refactor raid1_make_request
From: NeilBrown @ 2016-12-02 0:36 UTC (permalink / raw)
To: Robert LeBlanc, linux-raid
In-Reply-To: <20161130212020.15762-2-robert@leblancnet.us>
[-- Attachment #1: Type: text/plain, Size: 829 bytes --]
On Thu, Dec 01 2016, Robert LeBlanc wrote:
> Refactor raid1_make_request to make read and write code in their own
> functions to clean up the code.
There is a change in here that is more than just a refactoring.
You have moved the wait_barrier() call from after the checks on
->suspend_{lo,hi} to before.
If a write was sent to a suspended region, this will block any pending
resync until the suspended region is moved.
The previous code won't block resync in this case.
I cannot see that this would actually have any practical significance,
as I don't think the suspend_hi/lo settings are used at all for raid1.
I'm less certain about the ->area_resyncing() test, though that probably
isn't a problem.
But it should at least be noted in the change-log that there is
potentially a functional change here.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply
* [PATCH v2 0/2] Reorganize raid*_make_request to clean up code
From: Robert LeBlanc @ 2016-12-02 3:30 UTC (permalink / raw)
To: linux-raid; +Cc: Robert LeBlanc
In-Reply-To: <20161130212020.15762-1-robert@leblancnet.us>
In response to Christoph, I've broken the read and writes into their own
functions to make the code even cleaner. Since it is such a big change, I broke
up the commits into this series instead of creating a v2 of the previous patch.
Changes since v1:
John Stoffel
* Changed to if/then instead of return in raid1_make_request
Neil Brown
* Moved wait_barrier into raid1_{read,write}_request so that it could be after
->suspend_{hi,lo}. This prevents a write blocking a resync until the suspend
region is moved.
Robert LeBlanc (2):
md/raid1: Refactor raid1_make_request
md/raid10: Refactor raid10_make_request
drivers/md/raid1.c | 241 ++++++++++++++++++++++++----------------------
drivers/md/raid10.c | 271 +++++++++++++++++++++++++++-------------------------
2 files changed, 264 insertions(+), 248 deletions(-)
--
2.10.2
^ permalink raw reply
* [PATCH v2 1/2] md/raid1: Refactor raid1_make_request
From: Robert LeBlanc @ 2016-12-02 3:30 UTC (permalink / raw)
To: linux-raid; +Cc: Robert LeBlanc
In-Reply-To: <20161202033008.30314-1-robert@leblancnet.us>
Refactor raid1_make_request to make read and write code in their own
functions to clean up the code.
Signed-off-by: Robert LeBlanc <robert@leblancnet.us>
---
drivers/md/raid1.c | 241 +++++++++++++++++++++++++++--------------------------
1 file changed, 125 insertions(+), 116 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 21dc00e..0ea6541 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1032,17 +1032,97 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
kfree(plug);
}
-static void raid1_make_request(struct mddev *mddev, struct bio * bio)
+static void raid1_read_request(struct mddev *mddev, struct bio *bio,
+ struct r1bio *r1_bio)
{
struct r1conf *conf = mddev->private;
struct raid1_info *mirror;
- struct r1bio *r1_bio;
struct bio *read_bio;
+ struct bitmap *bitmap = mddev->bitmap;
+ const int op = bio_op(bio);
+ const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
+ int sectors_handled;
+ int max_sectors;
+ int rdisk;
+
+ wait_barrier(conf, bio);
+
+read_again:
+ rdisk = read_balance(conf, r1_bio, &max_sectors);
+
+ if (rdisk < 0) {
+ /* couldn't find anywhere to read from */
+ raid_end_bio_io(r1_bio);
+ return;
+ }
+ mirror = conf->mirrors + rdisk;
+
+ if (test_bit(WriteMostly, &mirror->rdev->flags) &&
+ bitmap) {
+ /* Reading from a write-mostly device must
+ * take care not to over-take any writes
+ * that are 'behind'
+ */
+ wait_event(bitmap->behind_wait,
+ atomic_read(&bitmap->behind_writes) == 0);
+ }
+ r1_bio->read_disk = rdisk;
+ r1_bio->start_next_window = 0;
+
+ read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ bio_trim(read_bio, r1_bio->sector - bio->bi_iter.bi_sector,
+ max_sectors);
+
+ r1_bio->bios[rdisk] = read_bio;
+
+ read_bio->bi_iter.bi_sector = r1_bio->sector +
+ mirror->rdev->data_offset;
+ read_bio->bi_bdev = mirror->rdev->bdev;
+ read_bio->bi_end_io = raid1_end_read_request;
+ bio_set_op_attrs(read_bio, op, do_sync);
+ read_bio->bi_private = r1_bio;
+
+ if (max_sectors < r1_bio->sectors) {
+ /* could not read all from this device, so we will
+ * need another r1_bio.
+ */
+
+ sectors_handled = (r1_bio->sector + max_sectors
+ - bio->bi_iter.bi_sector);
+ r1_bio->sectors = max_sectors;
+ spin_lock_irq(&conf->device_lock);
+ if (bio->bi_phys_segments == 0)
+ bio->bi_phys_segments = 2;
+ else
+ bio->bi_phys_segments++;
+ spin_unlock_irq(&conf->device_lock);
+ /* Cannot call generic_make_request directly
+ * as that will be queued in __make_request
+ * and subsequent mempool_alloc might block waiting
+ * for it. So hand bio over to raid1d.
+ */
+ reschedule_retry(r1_bio);
+
+ r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
+
+ r1_bio->master_bio = bio;
+ r1_bio->sectors = bio_sectors(bio) - sectors_handled;
+ r1_bio->state = 0;
+ r1_bio->mddev = mddev;
+ r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
+ goto read_again;
+ } else
+ generic_make_request(read_bio);
+}
+
+static void raid1_write_request(struct mddev *mddev, struct bio *bio,
+ struct r1bio *r1_bio)
+{
+ struct r1conf *conf = mddev->private;
int i, disks;
- struct bitmap *bitmap;
+ struct bitmap *bitmap = mddev->bitmap;
unsigned long flags;
const int op = bio_op(bio);
- const int rw = bio_data_dir(bio);
const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
const unsigned long do_flush_fua = (bio->bi_opf &
(REQ_PREFLUSH | REQ_FUA));
@@ -1062,12 +1142,11 @@ static void raid1_make_request(struct mddev *mddev, struct bio * bio)
md_write_start(mddev, bio); /* wait on superblock update early */
- if (bio_data_dir(bio) == WRITE &&
- ((bio_end_sector(bio) > mddev->suspend_lo &&
+ if ((bio_end_sector(bio) > mddev->suspend_lo &&
bio->bi_iter.bi_sector < mddev->suspend_hi) ||
(mddev_is_clustered(mddev) &&
md_cluster_ops->area_resyncing(mddev, WRITE,
- bio->bi_iter.bi_sector, bio_end_sector(bio))))) {
+ bio->bi_iter.bi_sector, bio_end_sector(bio)))) {
/* As the suspend_* range is controlled by
* userspace, we want an interruptible
* wait.
@@ -1081,119 +1160,15 @@ static void raid1_make_request(struct mddev *mddev, struct bio * bio)
bio->bi_iter.bi_sector >= mddev->suspend_hi ||
(mddev_is_clustered(mddev) &&
!md_cluster_ops->area_resyncing(mddev, WRITE,
- bio->bi_iter.bi_sector, bio_end_sector(bio))))
+ bio->bi_iter.bi_sector,
+ bio_end_sector(bio))))
break;
schedule();
}
finish_wait(&conf->wait_barrier, &w);
}
-
start_next_window = wait_barrier(conf, bio);
- bitmap = mddev->bitmap;
-
- /*
- * make_request() can abort the operation when read-ahead is being
- * used and no empty request is available.
- *
- */
- r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
-
- r1_bio->master_bio = bio;
- r1_bio->sectors = bio_sectors(bio);
- r1_bio->state = 0;
- r1_bio->mddev = mddev;
- r1_bio->sector = bio->bi_iter.bi_sector;
-
- /* We might need to issue multiple reads to different
- * devices if there are bad blocks around, so we keep
- * track of the number of reads in bio->bi_phys_segments.
- * If this is 0, there is only one r1_bio and no locking
- * will be needed when requests complete. If it is
- * non-zero, then it is the number of not-completed requests.
- */
- bio->bi_phys_segments = 0;
- bio_clear_flag(bio, BIO_SEG_VALID);
-
- if (rw == READ) {
- /*
- * read balancing logic:
- */
- int rdisk;
-
-read_again:
- rdisk = read_balance(conf, r1_bio, &max_sectors);
-
- if (rdisk < 0) {
- /* couldn't find anywhere to read from */
- raid_end_bio_io(r1_bio);
- return;
- }
- mirror = conf->mirrors + rdisk;
-
- if (test_bit(WriteMostly, &mirror->rdev->flags) &&
- bitmap) {
- /* Reading from a write-mostly device must
- * take care not to over-take any writes
- * that are 'behind'
- */
- wait_event(bitmap->behind_wait,
- atomic_read(&bitmap->behind_writes) == 0);
- }
- r1_bio->read_disk = rdisk;
- r1_bio->start_next_window = 0;
-
- read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- bio_trim(read_bio, r1_bio->sector - bio->bi_iter.bi_sector,
- max_sectors);
-
- r1_bio->bios[rdisk] = read_bio;
-
- read_bio->bi_iter.bi_sector = r1_bio->sector +
- mirror->rdev->data_offset;
- read_bio->bi_bdev = mirror->rdev->bdev;
- read_bio->bi_end_io = raid1_end_read_request;
- bio_set_op_attrs(read_bio, op, do_sync);
- read_bio->bi_private = r1_bio;
-
- if (max_sectors < r1_bio->sectors) {
- /* could not read all from this device, so we will
- * need another r1_bio.
- */
-
- sectors_handled = (r1_bio->sector + max_sectors
- - bio->bi_iter.bi_sector);
- r1_bio->sectors = max_sectors;
- spin_lock_irq(&conf->device_lock);
- if (bio->bi_phys_segments == 0)
- bio->bi_phys_segments = 2;
- else
- bio->bi_phys_segments++;
- spin_unlock_irq(&conf->device_lock);
- /* Cannot call generic_make_request directly
- * as that will be queued in __make_request
- * and subsequent mempool_alloc might block waiting
- * for it. So hand bio over to raid1d.
- */
- reschedule_retry(r1_bio);
-
- r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
-
- r1_bio->master_bio = bio;
- r1_bio->sectors = bio_sectors(bio) - sectors_handled;
- r1_bio->state = 0;
- r1_bio->mddev = mddev;
- r1_bio->sector = bio->bi_iter.bi_sector +
- sectors_handled;
- goto read_again;
- } else
- generic_make_request(read_bio);
- return;
- }
-
- /*
- * WRITE:
- */
if (conf->pending_count >= max_queued_requests) {
md_wakeup_thread(mddev->thread);
wait_event(conf->wait_barrier,
@@ -1236,8 +1211,7 @@ read_again:
int bad_sectors;
int is_bad;
- is_bad = is_badblock(rdev, r1_bio->sector,
- max_sectors,
+ is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
&first_bad, &bad_sectors);
if (is_bad < 0) {
/* mustn't write here until the bad block is
@@ -1325,7 +1299,8 @@ read_again:
continue;
mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- bio_trim(mbio, r1_bio->sector - bio->bi_iter.bi_sector, max_sectors);
+ bio_trim(mbio, r1_bio->sector - bio->bi_iter.bi_sector,
+ max_sectors);
if (first_clone) {
/* do behind I/O ?
@@ -1408,6 +1383,40 @@ read_again:
wake_up(&conf->wait_barrier);
}
+static void raid1_make_request(struct mddev *mddev, struct bio *bio)
+{
+ struct r1conf *conf = mddev->private;
+ struct r1bio *r1_bio;
+
+ /*
+ * make_request() can abort the operation when read-ahead is being
+ * used and no empty request is available.
+ *
+ */
+ r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
+
+ r1_bio->master_bio = bio;
+ r1_bio->sectors = bio_sectors(bio);
+ r1_bio->state = 0;
+ r1_bio->mddev = mddev;
+ r1_bio->sector = bio->bi_iter.bi_sector;
+
+ /* We might need to issue multiple reads to different
+ * devices if there are bad blocks around, so we keep
+ * track of the number of reads in bio->bi_phys_segments.
+ * If this is 0, there is only one r1_bio and no locking
+ * will be needed when requests complete. If it is
+ * non-zero, then it is the number of not-completed requests.
+ */
+ bio->bi_phys_segments = 0;
+ bio_clear_flag(bio, BIO_SEG_VALID);
+
+ if (bio_data_dir(bio) == READ)
+ raid1_read_request(mddev, bio, r1_bio);
+ else
+ raid1_write_request(mddev, bio, r1_bio);
+}
+
static void raid1_status(struct seq_file *seq, struct mddev *mddev)
{
struct r1conf *conf = mddev->private;
--
2.10.2
^ permalink raw reply related
* [PATCH v2 2/2] md/raid10: Refactor raid10_make_request
From: Robert LeBlanc @ 2016-12-02 3:30 UTC (permalink / raw)
To: linux-raid; +Cc: Robert LeBlanc
In-Reply-To: <20161202033008.30314-1-robert@leblancnet.us>
Refactor raid10_make_request into seperate read and write functions to
clean up the code.
Signed-off-by: Robert LeBlanc <robert@leblancnet.us>
---
drivers/md/raid10.c | 271 +++++++++++++++++++++++++++-------------------------
1 file changed, 139 insertions(+), 132 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index be1a9fc..d3bd756 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1046,150 +1046,89 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
kfree(plug);
}
-static void __make_request(struct mddev *mddev, struct bio *bio)
+static void raid10_read_request(struct mddev *mddev, struct bio *bio,
+ struct r10bio *r10_bio)
{
struct r10conf *conf = mddev->private;
- struct r10bio *r10_bio;
struct bio *read_bio;
- int i;
const int op = bio_op(bio);
- const int rw = bio_data_dir(bio);
const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
- const unsigned long do_fua = (bio->bi_opf & REQ_FUA);
- unsigned long flags;
- struct md_rdev *blocked_rdev;
- struct blk_plug_cb *cb;
- struct raid10_plug_cb *plug = NULL;
int sectors_handled;
int max_sectors;
- int sectors;
-
- md_write_start(mddev, bio);
-
- /*
- * Register the new request and wait if the reconstruction
- * thread has put up a bar for new requests.
- * Continue immediately if no resync is active currently.
- */
- wait_barrier(conf);
-
- sectors = bio_sectors(bio);
- while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
- bio->bi_iter.bi_sector < conf->reshape_progress &&
- bio->bi_iter.bi_sector + sectors > conf->reshape_progress) {
- /* IO spans the reshape position. Need to wait for
- * reshape to pass
- */
- allow_barrier(conf);
- wait_event(conf->wait_barrier,
- conf->reshape_progress <= bio->bi_iter.bi_sector ||
- conf->reshape_progress >= bio->bi_iter.bi_sector +
- sectors);
- wait_barrier(conf);
- }
- if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
- bio_data_dir(bio) == WRITE &&
- (mddev->reshape_backwards
- ? (bio->bi_iter.bi_sector < conf->reshape_safe &&
- bio->bi_iter.bi_sector + sectors > conf->reshape_progress)
- : (bio->bi_iter.bi_sector + sectors > conf->reshape_safe &&
- bio->bi_iter.bi_sector < conf->reshape_progress))) {
- /* Need to update reshape_position in metadata */
- mddev->reshape_position = conf->reshape_progress;
- set_mask_bits(&mddev->flags, 0,
- BIT(MD_CHANGE_DEVS) | BIT(MD_CHANGE_PENDING));
- md_wakeup_thread(mddev->thread);
- wait_event(mddev->sb_wait,
- !test_bit(MD_CHANGE_PENDING, &mddev->flags));
+ struct md_rdev *rdev;
+ int slot;
- conf->reshape_safe = mddev->reshape_position;
+read_again:
+ rdev = read_balance(conf, r10_bio, &max_sectors);
+ if (!rdev) {
+ raid_end_bio_io(r10_bio);
+ return;
}
+ slot = r10_bio->read_slot;
- r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
-
- r10_bio->master_bio = bio;
- r10_bio->sectors = sectors;
+ read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ bio_trim(read_bio, r10_bio->sector - bio->bi_iter.bi_sector,
+ max_sectors);
- r10_bio->mddev = mddev;
- r10_bio->sector = bio->bi_iter.bi_sector;
- r10_bio->state = 0;
+ r10_bio->devs[slot].bio = read_bio;
+ r10_bio->devs[slot].rdev = rdev;
- /* We might need to issue multiple reads to different
- * devices if there are bad blocks around, so we keep
- * track of the number of reads in bio->bi_phys_segments.
- * If this is 0, there is only one r10_bio and no locking
- * will be needed when the request completes. If it is
- * non-zero, then it is the number of not-completed requests.
- */
- bio->bi_phys_segments = 0;
- bio_clear_flag(bio, BIO_SEG_VALID);
+ read_bio->bi_iter.bi_sector = r10_bio->devs[slot].addr +
+ choose_data_offset(r10_bio, rdev);
+ read_bio->bi_bdev = rdev->bdev;
+ read_bio->bi_end_io = raid10_end_read_request;
+ bio_set_op_attrs(read_bio, op, do_sync);
+ read_bio->bi_private = r10_bio;
- if (rw == READ) {
- /*
- * read balancing logic:
+ if (max_sectors < r10_bio->sectors) {
+ /* Could not read all from this device, so we will
+ * need another r10_bio.
*/
- struct md_rdev *rdev;
- int slot;
-
-read_again:
- rdev = read_balance(conf, r10_bio, &max_sectors);
- if (!rdev) {
- raid_end_bio_io(r10_bio);
- return;
- }
- slot = r10_bio->read_slot;
-
- read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- bio_trim(read_bio, r10_bio->sector - bio->bi_iter.bi_sector,
- max_sectors);
-
- r10_bio->devs[slot].bio = read_bio;
- r10_bio->devs[slot].rdev = rdev;
-
- read_bio->bi_iter.bi_sector = r10_bio->devs[slot].addr +
- choose_data_offset(r10_bio, rdev);
- read_bio->bi_bdev = rdev->bdev;
- read_bio->bi_end_io = raid10_end_read_request;
- bio_set_op_attrs(read_bio, op, do_sync);
- read_bio->bi_private = r10_bio;
+ sectors_handled = (r10_bio->sector + max_sectors
+ - bio->bi_iter.bi_sector);
+ r10_bio->sectors = max_sectors;
+ spin_lock_irq(&conf->device_lock);
+ if (bio->bi_phys_segments == 0)
+ bio->bi_phys_segments = 2;
+ else
+ bio->bi_phys_segments++;
+ spin_unlock_irq(&conf->device_lock);
+ /* Cannot call generic_make_request directly
+ * as that will be queued in __generic_make_request
+ * and subsequent mempool_alloc might block
+ * waiting for it. so hand bio over to raid10d.
+ */
+ reschedule_retry(r10_bio);
- if (max_sectors < r10_bio->sectors) {
- /* Could not read all from this device, so we will
- * need another r10_bio.
- */
- sectors_handled = (r10_bio->sector + max_sectors
- - bio->bi_iter.bi_sector);
- r10_bio->sectors = max_sectors;
- spin_lock_irq(&conf->device_lock);
- if (bio->bi_phys_segments == 0)
- bio->bi_phys_segments = 2;
- else
- bio->bi_phys_segments++;
- spin_unlock_irq(&conf->device_lock);
- /* Cannot call generic_make_request directly
- * as that will be queued in __generic_make_request
- * and subsequent mempool_alloc might block
- * waiting for it. so hand bio over to raid10d.
- */
- reschedule_retry(r10_bio);
+ r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
- r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
+ r10_bio->master_bio = bio;
+ r10_bio->sectors = bio_sectors(bio) - sectors_handled;
+ r10_bio->state = 0;
+ r10_bio->mddev = mddev;
+ r10_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
+ goto read_again;
+ } else
+ generic_make_request(read_bio);
+ return;
+}
- r10_bio->master_bio = bio;
- r10_bio->sectors = bio_sectors(bio) - sectors_handled;
- r10_bio->state = 0;
- r10_bio->mddev = mddev;
- r10_bio->sector = bio->bi_iter.bi_sector +
- sectors_handled;
- goto read_again;
- } else
- generic_make_request(read_bio);
- return;
- }
+static void raid10_write_request(struct mddev *mddev, struct bio *bio,
+ struct r10bio *r10_bio)
+{
+ struct r10conf *conf = mddev->private;
+ int i;
+ const int op = bio_op(bio);
+ const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
+ const unsigned long do_fua = (bio->bi_opf & REQ_FUA);
+ unsigned long flags;
+ struct md_rdev *blocked_rdev;
+ struct blk_plug_cb *cb;
+ struct raid10_plug_cb *plug = NULL;
+ int sectors_handled;
+ int max_sectors;
+ md_write_start(mddev, bio);
- /*
- * WRITE:
- */
if (conf->pending_count >= max_queued_requests) {
md_wakeup_thread(mddev->thread);
wait_event(conf->wait_barrier,
@@ -1249,8 +1188,7 @@ retry_write:
int bad_sectors;
int is_bad;
- is_bad = is_badblock(rdev, dev_sector,
- max_sectors,
+ is_bad = is_badblock(rdev, dev_sector, max_sectors,
&first_bad, &bad_sectors);
if (is_bad < 0) {
/* Mustn't write here until the bad block
@@ -1353,8 +1291,7 @@ retry_write:
r10_bio->devs[i].bio = mbio;
mbio->bi_iter.bi_sector = (r10_bio->devs[i].addr+
- choose_data_offset(r10_bio,
- rdev));
+ choose_data_offset(r10_bio, rdev));
mbio->bi_bdev = rdev->bdev;
mbio->bi_end_io = raid10_end_write_request;
bio_set_op_attrs(mbio, op, do_sync | do_fua);
@@ -1395,8 +1332,7 @@ retry_write:
r10_bio->devs[i].repl_bio = mbio;
mbio->bi_iter.bi_sector = (r10_bio->devs[i].addr +
- choose_data_offset(
- r10_bio, rdev));
+ choose_data_offset(r10_bio, rdev));
mbio->bi_bdev = rdev->bdev;
mbio->bi_end_io = raid10_end_write_request;
bio_set_op_attrs(mbio, op, do_sync | do_fua);
@@ -1434,6 +1370,77 @@ retry_write:
one_write_done(r10_bio);
}
+static void __make_request(struct mddev *mddev, struct bio *bio)
+{
+ struct r10conf *conf = mddev->private;
+ struct r10bio *r10_bio;
+ int sectors;
+
+ /*
+ * Register the new request and wait if the reconstruction
+ * thread has put up a bar for new requests.
+ * Continue immediately if no resync is active currently.
+ */
+ wait_barrier(conf);
+
+ sectors = bio_sectors(bio);
+ while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
+ bio->bi_iter.bi_sector < conf->reshape_progress &&
+ bio->bi_iter.bi_sector + sectors > conf->reshape_progress) {
+ /* IO spans the reshape position. Need to wait for
+ * reshape to pass
+ */
+ allow_barrier(conf);
+ wait_event(conf->wait_barrier,
+ conf->reshape_progress <= bio->bi_iter.bi_sector ||
+ conf->reshape_progress >= bio->bi_iter.bi_sector +
+ sectors);
+ wait_barrier(conf);
+ }
+ if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
+ bio_data_dir(bio) == WRITE &&
+ (mddev->reshape_backwards
+ ? (bio->bi_iter.bi_sector < conf->reshape_safe &&
+ bio->bi_iter.bi_sector + sectors > conf->reshape_progress)
+ : (bio->bi_iter.bi_sector + sectors > conf->reshape_safe &&
+ bio->bi_iter.bi_sector < conf->reshape_progress))) {
+ /* Need to update reshape_position in metadata */
+ mddev->reshape_position = conf->reshape_progress;
+ set_mask_bits(&mddev->flags, 0,
+ BIT(MD_CHANGE_DEVS) | BIT(MD_CHANGE_PENDING));
+ md_wakeup_thread(mddev->thread);
+ wait_event(mddev->sb_wait,
+ !test_bit(MD_CHANGE_PENDING, &mddev->flags));
+
+ conf->reshape_safe = mddev->reshape_position;
+ }
+
+ r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
+
+ r10_bio->master_bio = bio;
+ r10_bio->sectors = sectors;
+
+ r10_bio->mddev = mddev;
+ r10_bio->sector = bio->bi_iter.bi_sector;
+ r10_bio->state = 0;
+
+ /* We might need to issue multiple reads to different
+ * devices if there are bad blocks around, so we keep
+ * track of the number of reads in bio->bi_phys_segments.
+ * If this is 0, there is only one r10_bio and no locking
+ * will be needed when the request completes. If it is
+ * non-zero, then it is the number of not-completed requests.
+ */
+ bio->bi_phys_segments = 0;
+ bio_clear_flag(bio, BIO_SEG_VALID);
+
+ if (bio_data_dir(bio) == READ) {
+ raid10_read_request(mddev, bio, r10_bio);
+ return;
+ }
+ raid10_write_request(mddev, bio, r10_bio);
+}
+
static void raid10_make_request(struct mddev *mddev, struct bio *bio)
{
struct r10conf *conf = mddev->private;
--
2.10.2
^ permalink raw reply related
* Re: Feature request, resumable raid check action
From: NeilBrown @ 2016-12-02 4:54 UTC (permalink / raw)
To: Patrick Dung, linux-raid
In-Reply-To: <CAEtPA0Aosn6q0pJxsh69EpHQehiSsyeyVfJf98cDhu6kORkE1g@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 930 bytes --]
On Wed, Nov 30 2016, Patrick Dung wrote:
> Hello,
>
> As I know if MD raid is using the newer metadata version, it support
> resumable raid rebuild/sync. (that is, if a server is rebooted during
> rebuild, it would resume from last position after reboot, instead of
> starting from beginning).
>
> In my recently testing (a few months ago):
> I sometimes use the mdadm 'check' action for doing the disk scrubbing
> of a MD raid.
> After I rebooted the server, the 'check' operation is forgotten and is
> not resumable.
>
> I think resumable 'check' operation is useful as the array size would
> become bigger in the future.
"check" is resumable. md doesn't record where it is up to though, you
need to do that yourself.
The "misc/mdcheck" script in the mdadm package makes use of this to
support time-limited checking, and to resume from where it left off.
You could use the script, or read it and see how it works.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply
* Re: Feature request, resumable raid check action
From: Patrick Dung @ 2016-12-02 5:57 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
In-Reply-To: <8737i76idm.fsf@notabene.neil.brown.name>
Thanks for reply, NeilBrown.
In my testing:
I had run the mdcheck script, then reboot.
After rebooted, then I run the mdcheck script again, it did not
resume from where it stopped before reboot.
Is there something that I may had missed?
Best regards,
Patrick
On Fri, Dec 2, 2016 at 12:54 PM, NeilBrown <neilb@suse.com> wrote:
> On Wed, Nov 30 2016, Patrick Dung wrote:
>
>> Hello,
>>
>> As I know if MD raid is using the newer metadata version, it support
>> resumable raid rebuild/sync. (that is, if a server is rebooted during
>> rebuild, it would resume from last position after reboot, instead of
>> starting from beginning).
>>
>> In my recently testing (a few months ago):
>> I sometimes use the mdadm 'check' action for doing the disk scrubbing
>> of a MD raid.
>> After I rebooted the server, the 'check' operation is forgotten and is
>> not resumable.
>>
>> I think resumable 'check' operation is useful as the array size would
>> become bigger in the future.
>
> "check" is resumable. md doesn't record where it is up to though, you
> need to do that yourself.
> The "misc/mdcheck" script in the mdadm package makes use of this to
> support time-limited checking, and to resume from where it left off.
> You could use the script, or read it and see how it works.
>
> NeilBrown
^ permalink raw reply
* Re: Feature request, resumable raid check action
From: NeilBrown @ 2016-12-02 6:09 UTC (permalink / raw)
To: Patrick Dung; +Cc: linux-raid
In-Reply-To: <CAEtPA0C=xH0gCaigs-THNZRspmwD6d4WCrecjL-30M3PP0vxHw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 444 bytes --]
On Fri, Dec 02 2016, Patrick Dung wrote:
> Thanks for reply, NeilBrown.
>
> In my testing:
> I had run the mdcheck script, then reboot.
> After rebooted, then I run the mdcheck script again, it did not
> resume from where it stopped before reboot.
>
> Is there something that I may had missed?
Did you reboot while it was running, or after it had finished its
aloted time?
You need to let it finish and record where it got up to.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply
* [PATCH v3 7/9] imsm: provide list of bad blocks for an array
From: Tomasz Majchrzak @ 2016-12-02 12:54 UTC (permalink / raw)
To: linux-raid; +Cc: Jes.Sorensen, Tomasz Majchrzak
In-Reply-To: <wrfjoa0vz3bi.fsf@redhat.com>
Provide list of bad blocks using memory allocated in advance so it's
safe to call it from monitor.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
---
super-intel.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/super-intel.c b/super-intel.c
index 0562a55..f4e243f 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -393,6 +393,7 @@ struct intel_super {
struct intel_hba *hba; /* device path of the raid controller for this metadata */
const struct imsm_orom *orom; /* platform firmware support */
struct intel_super *next; /* (temp) list for disambiguating family_num */
+ struct md_bb bb; /* memory for get_bad_blocks call */
};
struct intel_disk {
@@ -4283,6 +4284,7 @@ static void __free_imsm(struct intel_super *super, int free_disks)
static void free_imsm(struct intel_super *super)
{
__free_imsm(super, 1);
+ free(super->bb.entries);
free(super);
}
@@ -4303,6 +4305,14 @@ static struct intel_super *alloc_super(void)
super->current_vol = -1;
super->create_offset = ~((unsigned long long) 0);
+
+ super->bb.entries = xmalloc(BBM_LOG_MAX_ENTRIES *
+ sizeof(struct md_bb_entry));
+ if (!super->bb.entries) {
+ free(super);
+ return NULL;
+ }
+
return super;
}
@@ -9882,6 +9892,34 @@ static int imsm_clear_badblock(struct active_array *a, int slot,
return ret;
}
/*******************************************************************************
+* Function: imsm_get_badblocks
+* Description: This routine get list of bad blocks for an array
+*
+* Parameters:
+* a : array
+* slot : disk number
+* Returns:
+* bb : structure containing bad blocks
+* NULL : error
+******************************************************************************/
+static struct md_bb *imsm_get_badblocks(struct active_array *a, int slot)
+{
+ int inst = a->info.container_member;
+ struct intel_super *super = a->container->sb;
+ struct imsm_dev *dev = get_imsm_dev(super, inst);
+ struct imsm_map *map = get_imsm_map(dev, MAP_0);
+ int ord;
+
+ ord = imsm_disk_slot_to_ord(a, slot);
+ if (ord < 0)
+ return NULL;
+
+ get_volume_badblocks(super->bbm_log, ord_to_idx(ord), pba_of_lba0(map),
+ blocks_per_member(map), &super->bb);
+
+ return &super->bb;
+}
+/*******************************************************************************
* Function: init_migr_record_imsm
* Description: Function inits imsm migration record
* Parameters:
@@ -11436,5 +11474,6 @@ struct superswitch super_imsm = {
.prepare_update = imsm_prepare_update,
.record_bad_block = imsm_record_badblock,
.clear_bad_block = imsm_clear_badblock,
+ .get_bad_blocks = imsm_get_badblocks,
#endif /* MDASSEMBLE */
};
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v2 0/9] Bad block support for IMSM metadata
From: Jes Sorensen @ 2016-12-02 16:04 UTC (permalink / raw)
To: Tomasz Majchrzak; +Cc: linux-raid
In-Reply-To: <1480424555-31509-1-git-send-email-tomasz.majchrzak@intel.com>
Tomasz Majchrzak <tomasz.majchrzak@intel.com> writes:
> This series of patches implements bad block support for IMSM metadata.
>
> Requested changes have been applied to the first 2 patches. The whole set is
> resent because the previous patches don't apply cleanly on the latest upstream
> branch. An extra patch has been added which adds 4kn support for bad block log.
>
> Regards,
>
> Tomek
Applied using v3 for 1, 3, and 7.
Thanks,
Jes
^ permalink raw reply
* clustered MD - beyond RAID1
From: Robert Woodworth @ 2016-12-02 18:12 UTC (permalink / raw)
To: linux-raid
Excuse me for being late to the party on this subject, but is the idea of
clustered RAID5/6 alive or dead?
I have a need for such a feature. I'm in development on SAS JBODs with
large drive counts, 60 and 90 drives per JBOD. We would like to support
multi-host connectivity in an active/active fashion with MD RAID60. This
clustered MD RAID can and should be a nice alternative to HW RAID solutions
like LSI/Avago "Syncro" MegaRAID.
I currently have the hardware and time to help develop and test the
clustered RAID5/6.
I just finished up building a test cluster of 2 nodes with the cluster-md
RAID1. Worked fine with gfs2 on top.
My current real job is firmware on these SAS JBODS. I have many years of
Linux experience and have developed (years ago) some kernel modules for a
custom FPGA based PCIe cards.
^ permalink raw reply
* Re: MD Remnants After --stop
From: Stephane Thiell @ 2016-12-02 18:18 UTC (permalink / raw)
To: NeilBrown; +Cc: Marc Smith, linux-raid@vger.kernel.org
In-Reply-To: <87k2bj6zwf.fsf@notabene.neil.brown.name>
Hey,
Just wanted to jump in as I reported a very similar problem recently on:
https://github.com/neilbrown/mdadm/issues/29
Although it’s much better with mdadm 3.4 in my case, I still occasionally get remnant md after --stop ...
Please let me know what you think.
Best,
Stephan
> On Dec 1, 2016, at 2:35 PM, NeilBrown <neilb@suse.com> wrote:
>
> On Fri, Dec 02 2016, Marc Smith wrote:
>
>> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Mon, Nov 28 2016, Marc Smith wrote:
>>>
>>>>
>>>> # find /sys/block/md127/md
>>>> /sys/block/md127/md
>>>> /sys/block/md127/md/reshape_position
>>>> /sys/block/md127/md/layout
>>>> /sys/block/md127/md/raid_disks
>>>> /sys/block/md127/md/bitmap
>>>> /sys/block/md127/md/bitmap/chunksize
>>>
>>> This tells me that:
>>> sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>>> hasn't been run, so mddev_delayed_delete() hasn't run.
>>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>>
>>> Everything else suggests that array has been stopped and cleaned and
>>> should be gone...
>>>
>>> This seems to suggest that there is an unbalanced mddev_get() without a
>>> matching mddev_put(). I cannot find it though.
>>>
>>> If I could reproduce it, I would try to see what is happening by:
>>>
>>> - putting
>>> printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>> in the top of mddev_put(). That shouldn't be *too* noisy.
>>>
>>> - putting
>>> printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>> list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>>
>>> in mddev_put() just before those values are tested.
>>>
>>> - putting
>>> printk("queue_work\n");
>>> just before the 'queue_work()' call in mddev_put.
>>>
>>> - putting
>>> printk("mddev_delayed_delete\n");
>>> in mddev_delayed_delete()
>>>
>>> Then see what gets printed when you stop the array.
>>
>> I made those modifications to md.c and here is the kernel log when stopping:
>>
>> --snip--
>> [ 3937.233487] mddev->active = 2
>> [ 3937.233503] mddev->active = 2
>> [ 3937.233509] mddev->active = 2
>> [ 3937.233516] mddev->active = 1
>> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
>
> At this point, mdadm has opened the /dev/md127 device, accessed a few
> attributes via sysfs just to check on the status, and then closed it
> again.
> The array is still active, but we know that no other process has it
> open.
>
>
>> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
>> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
>> [ 3937.241493] md: md127 stopped.
>
> Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
> Still nothing else has the array open.
>
>> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
>> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
>> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
>> [ 3937.241989] udevd[4991]: seq 3631 running
>> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
>> lockspace group...
>> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
>> [ 3937.242068] mddev->active = 3
>
> But somehow the ->active count got up to 3.
> mdadm probably still has it open, but two other things do too.
> If you have "mdadm --monitor" running in the background (which is good)
> it will temporarily increase, then decrease the count.
> udevd opens the device temporarily too.
> So this isn't necessarily a problem.
>
>> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
>> [ 3937.242080] mddev->active = 3
>> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
>> /usr/lib/udev/rules.d/69-bcache.rules:16
>> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
>> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
>> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
>> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
>> release_lockspace final free
>> [ 3937.242861] md: unbind<dm-1>
>> [ 3937.256606] md: export_rdev(dm-1)
>> [ 3937.256612] md: unbind<dm-0>
>> [ 3937.263601] md: export_rdev(dm-0)
>> [ 3937.263688] mddev->active = 4
>> [ 3937.263751] mddev->active = 3
>
> But here, the active count only drops down to 2. (it is decremented
> after it is printed). Assuming there really were no more messages like
> this, there are two active references to the md device, and we don't
> know what they are.
>
>>
>> I didn't use my modified mdadm which stops the synthesized CHANGE from
>> occurring, but if needed, I can re-run the test using that.
>
> It would be good to use the modified mdadm, if only to reduce the
> noise. It won't change the end result, but might make it easier to see
> what is happening.
> Also please add
> WARN_ON(1);
>
> in the start of mddev_get() and mddev_put().
> That will provide a stack trace whenever either of these are called, so
> we can see who takes a references, and who doesn't release it.
>
> Thanks,
> NeilBrown
^ permalink raw reply
* Re: MD Remnants After --stop
From: Marc Smith @ 2016-12-02 19:12 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
In-Reply-To: <87k2bj6zwf.fsf@notabene.neil.brown.name>
On Thu, Dec 1, 2016 at 5:35 PM, NeilBrown <neilb@suse.com> wrote:
> On Fri, Dec 02 2016, Marc Smith wrote:
>
>> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Mon, Nov 28 2016, Marc Smith wrote:
>>>
>>>>
>>>> # find /sys/block/md127/md
>>>> /sys/block/md127/md
>>>> /sys/block/md127/md/reshape_position
>>>> /sys/block/md127/md/layout
>>>> /sys/block/md127/md/raid_disks
>>>> /sys/block/md127/md/bitmap
>>>> /sys/block/md127/md/bitmap/chunksize
>>>
>>> This tells me that:
>>> sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>>> hasn't been run, so mddev_delayed_delete() hasn't run.
>>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>>
>>> Everything else suggests that array has been stopped and cleaned and
>>> should be gone...
>>>
>>> This seems to suggest that there is an unbalanced mddev_get() without a
>>> matching mddev_put(). I cannot find it though.
>>>
>>> If I could reproduce it, I would try to see what is happening by:
>>>
>>> - putting
>>> printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>> in the top of mddev_put(). That shouldn't be *too* noisy.
>>>
>>> - putting
>>> printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>> list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>>
>>> in mddev_put() just before those values are tested.
>>>
>>> - putting
>>> printk("queue_work\n");
>>> just before the 'queue_work()' call in mddev_put.
>>>
>>> - putting
>>> printk("mddev_delayed_delete\n");
>>> in mddev_delayed_delete()
>>>
>>> Then see what gets printed when you stop the array.
>>
>> I made those modifications to md.c and here is the kernel log when stopping:
>>
>> --snip--
>> [ 3937.233487] mddev->active = 2
>> [ 3937.233503] mddev->active = 2
>> [ 3937.233509] mddev->active = 2
>> [ 3937.233516] mddev->active = 1
>> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
>
> At this point, mdadm has opened the /dev/md127 device, accessed a few
> attributes via sysfs just to check on the status, and then closed it
> again.
> The array is still active, but we know that no other process has it
> open.
>
>
>> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
>> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
>> [ 3937.241493] md: md127 stopped.
>
> Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
> Still nothing else has the array open.
>
>> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
>> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
>> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
>> [ 3937.241989] udevd[4991]: seq 3631 running
>> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
>> lockspace group...
>> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
>> [ 3937.242068] mddev->active = 3
>
> But somehow the ->active count got up to 3.
> mdadm probably still has it open, but two other things do too.
> If you have "mdadm --monitor" running in the background (which is good)
> it will temporarily increase, then decrease the count.
> udevd opens the device temporarily too.
> So this isn't necessarily a problem.
>
>> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
>> [ 3937.242080] mddev->active = 3
>> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
>> /usr/lib/udev/rules.d/69-bcache.rules:16
>> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
>> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
>> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
>> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
>> release_lockspace final free
>> [ 3937.242861] md: unbind<dm-1>
>> [ 3937.256606] md: export_rdev(dm-1)
>> [ 3937.256612] md: unbind<dm-0>
>> [ 3937.263601] md: export_rdev(dm-0)
>> [ 3937.263688] mddev->active = 4
>> [ 3937.263751] mddev->active = 3
>
> But here, the active count only drops down to 2. (it is decremented
> after it is printed). Assuming there really were no more messages like
> this, there are two active references to the md device, and we don't
> know what they are.
>
>>
>> I didn't use my modified mdadm which stops the synthesized CHANGE from
>> occurring, but if needed, I can re-run the test using that.
>
> It would be good to use the modified mdadm, if only to reduce the
> noise. It won't change the end result, but might make it easier to see
> what is happening.
> Also please add
> WARN_ON(1);
>
> in the start of mddev_get() and mddev_put().
> That will provide a stack trace whenever either of these are called, so
> we can see who takes a references, and who doesn't release it.
Okay, I added that to both functions, and now I can't get stopping the
array to misbehave (eg, not generate the REMOVE event). I've been
trying all morning! I literally just added the WARN_ON(1) to those two
functions, and that's all I changed. I compiled and reinstalled image,
no other changes. I've tried quite a few times now to reproduce this,
and I'm failing to do so -- every time the REMOVE event is generated
and everything is removed correctly.
I'm going to switch back to the previous image and confirm its
reproducible with that.
--Marc
>
> Thanks,
> NeilBrown
>
^ permalink raw reply
* Re: clustered MD - beyond RAID1
From: Shaohua Li @ 2016-12-02 20:02 UTC (permalink / raw)
To: Robert Woodworth; +Cc: linux-raid
In-Reply-To: <CAB9NSeV2A+BZzUbzj516e24CL05k6zF6tgZ+vpNptSO30ipyug@mail.gmail.com>
On Fri, Dec 02, 2016 at 11:12:52AM -0700, Robert Woodworth wrote:
> Excuse me for being late to the party on this subject, but is the idea of
> clustered RAID5/6 alive or dead?
>
> I have a need for such a feature. I'm in development on SAS JBODs with
> large drive counts, 60 and 90 drives per JBOD. We would like to support
> multi-host connectivity in an active/active fashion with MD RAID60. This
> clustered MD RAID can and should be a nice alternative to HW RAID solutions
> like LSI/Avago "Syncro" MegaRAID.
>
> I currently have the hardware and time to help develop and test the
> clustered RAID5/6.
> I just finished up building a test cluster of 2 nodes with the cluster-md
> RAID1. Worked fine with gfs2 on top.
>
>
> My current real job is firmware on these SAS JBODS. I have many years of
> Linux experience and have developed (years ago) some kernel modules for a
> custom FPGA based PCIe cards.
It makes a lot of sense to me and no reason we don't support it, and especially
you have real usage of it. If anybody wants to implement it, I'm very glad to
help/review patches.
Thanks,
Shaohua
^ permalink raw reply
* Re: [PATCH v2] md/r5cache: run_no_space_stripes() when R5C_LOG_CRITICAL == 0
From: Shaohua Li @ 2016-12-02 20:03 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, neilb, shli, kernel-team, dan.j.williams, hch
In-Reply-To: <20161201005754.3149044-1-songliubraving@fb.com>
On Wed, Nov 30, 2016 at 04:57:54PM -0800, Song Liu wrote:
> With writeback cache, we define log space critical as
>
> free_space < 2 * reclaim_required_space
>
> So the deassert of R5C_LOG_CRITICAL could happen when
> 1. free_space increases
> 2. reclaim_required_space decreases
>
> Currently, run_no_space_stripes() is called when 1 happens, but
> not (always) when 2 happens.
>
> With this patch, run_no_space_stripes() is call when
> R5C_LOG_CRITICAL is cleared.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
applied, thanks!
> ---
> drivers/md/raid5-cache.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index e786d4e..c36f86b 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -370,6 +370,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
> struct r5conf *conf = log->rdev->mddev->private;
> sector_t free_space;
> sector_t reclaim_space;
> + bool wake_reclaim = false;
>
> if (!r5c_is_writeback(log))
> return;
> @@ -379,12 +380,18 @@ static inline void r5c_update_log_state(struct r5l_log *log)
> reclaim_space = r5c_log_required_to_flush_cache(conf);
> if (free_space < 2 * reclaim_space)
> set_bit(R5C_LOG_CRITICAL, &conf->cache_state);
> - else
> + else {
> + if (test_bit(R5C_LOG_CRITICAL, &conf->cache_state))
> + wake_reclaim = true;
> clear_bit(R5C_LOG_CRITICAL, &conf->cache_state);
> + }
> if (free_space < 3 * reclaim_space)
> set_bit(R5C_LOG_TIGHT, &conf->cache_state);
> else
> clear_bit(R5C_LOG_TIGHT, &conf->cache_state);
> +
> + if (wake_reclaim)
> + r5l_wake_reclaim(log, 0);
> }
>
> /*
> @@ -1345,6 +1352,10 @@ static void r5c_do_reclaim(struct r5conf *conf)
> spin_unlock(&conf->device_lock);
> spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
> }
> +
> + if (!test_bit(R5C_LOG_CRITICAL, &conf->cache_state))
> + r5l_run_no_space_stripes(log);
> +
> md_wakeup_thread(conf->mddev->thread);
> }
>
> @@ -2401,6 +2412,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
> spin_unlock_irq(&conf->log->stripe_in_journal_lock);
> sh->log_start = MaxSector;
> atomic_dec(&conf->log->stripe_in_journal_count);
> + r5c_update_log_state(conf->log);
> }
>
> int
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/2] md: disable WRITE SAME if it fails for linear/raid0
From: Shaohua Li @ 2016-12-02 20:07 UTC (permalink / raw)
To: sitsofe; +Cc: linux-raid, sitsofe, neilb
In-Reply-To: <35d7516cdfdcaa734e5b8cc90a8dbac8e3d201e0.1480552575.git.shli@fb.com>
On Wed, Nov 30, 2016 at 04:39:11PM -0800, Shaohua Li wrote:
> This makes md do the same thing as dm for write same IO failure. Please
> see 7eee4ae(dm: disable WRITE SAME if it fails) for details why we need
> this.
>
> Also reported here: https://bugzilla.kernel.org/show_bug.cgi?id=118581
Sitsofe,
can you give a shot of the patch please? It works well here, but would
appreciate if you could test it.
Thanks,
Shaohua
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
> drivers/md/linear.c | 2 ++
> drivers/md/md.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> drivers/md/md.h | 2 ++
> drivers/md/raid0.c | 2 ++
> 4 files changed, 48 insertions(+)
>
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index 5975c99..d3c7b4d 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -262,6 +262,8 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
> trace_block_bio_remap(bdev_get_queue(split->bi_bdev),
> split, disk_devt(mddev->gendisk),
> bio_sector);
> + if (bio_op(split) == REQ_OP_WRITE_SAME)
> + md_writesame_setup(mddev, split);
> generic_make_request(split);
> }
> } while (split != bio);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c7894fb..5e6efcd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -312,6 +312,48 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
> return BLK_QC_T_NONE;
> }
>
> +struct md_writesame_data {
> + bio_end_io_t *orig_endio;
> + void *orig_private;
> + struct mddev *mddev;
> +};
> +
> +static void md_writesame_endio(struct bio *bio)
> +{
> + struct md_writesame_data *data = bio->bi_private;
> +
> + if (bio->bi_error == -EREMOTEIO &&
> + !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
> + data->mddev->queue->limits.max_write_same_sectors = 0;
> +
> + bio->bi_private = data->orig_private;
> + bio->bi_end_io = data->orig_endio;
> + bio_endio(bio);
> +
> + kfree(data);
> +}
> +
> +void md_writesame_setup(struct mddev *mddev, struct bio *bio)
> +{
> + struct md_writesame_data *data;
> +
> + /*
> + * this failure means we ignore a chance to handle writesame failure,
> + * which isn't critcal, we can handle the failure if new writesame IO
> + * comes
> + */
> + data = kmalloc(sizeof(*data), GFP_NOIO | __GFP_NORETRY);
> + if (!data)
> + return;
> + data->orig_endio = bio->bi_end_io;
> + data->orig_private = bio->bi_private;
> + data->mddev = mddev;
> +
> + bio->bi_private = data;
> + bio->bi_end_io = md_writesame_endio;
> +}
> +EXPORT_SYMBOL_GPL(md_writesame_setup);
> +
> /* mddev_suspend makes sure no new requests are submitted
> * to the device, and that any requests that have been submitted
> * are completely handled.
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 5c08f84..2d1556b 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -700,4 +700,6 @@ static inline int mddev_is_clustered(struct mddev *mddev)
> {
> return mddev->cluster_info && mddev->bitmap_info.nodes > 1;
> }
> +
> +extern void md_writesame_setup(struct mddev *mddev, struct bio *bio);
> #endif /* _MD_MD_H */
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index e628f18..4811116 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -498,6 +498,8 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
> trace_block_bio_remap(bdev_get_queue(split->bi_bdev),
> split, disk_devt(mddev->gendisk),
> bio_sector);
> + if (bio_op(split) == REQ_OP_WRITE_SAME)
> + md_writesame_setup(mddev, split);
> generic_make_request(split);
> }
> } while (split != bio);
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 4/4] md/raid5-cache: adjust the write position of the empty block and mark it as a checkpoint
From: Shaohua Li @ 2016-12-02 20:10 UTC (permalink / raw)
To: JackieLiu; +Cc: songliubraving, 刘正元, linux-raid
In-Reply-To: <256217AC-28C5-4C52-ABDB-9C36221E6A1B@kylinos.cn>
On Wed, Nov 30, 2016 at 12:03:14PM +0800, JackieLiu wrote:
>
> > 在 2016年11月30日,06:31,Shaohua Li <shli@kernel.org> 写道:
> >
> > On Mon, Nov 28, 2016 at 04:19:21PM +0800, JackieLiu wrote:
> >> When recovery is complete, we write an empty block and record his
> >> position first, then make the data-only stripes rewritten done,
> >> the location of the empty block as the last checkpoint position
> >> to write into the super block. And we should update last_checkpoint
> >> to this empty block position.
> >> ...
> >> + pos = ctx.pos;
> >> + r5l_log_write_empty_meta_block(log, ctx.pos, (ctx.seq += 10));
> >
> > hmm, move the ctx.seq += 10 out please
>
> yep, if this patch is OK,I will resend an appropriate patch.
>
> >> + ctx.pos = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS);
> >> +
> >> if ((ctx.data_only_stripes == 0) && (ctx.data_parity_stripes == 0))
> >> pr_debug("md/raid:%s: starting from clean shutdown\n",
> >> mdname(mddev));
> >> @@ -2167,9 +2171,9 @@ static int r5l_recovery_log(struct r5l_log *log)
> >>
> >> log->log_start = ctx.pos;
> >> log->next_checkpoint = ctx.pos;
> >> + log->last_checkpoint = pos;
> >> log->seq = ctx.seq;
> >> - r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq);
> >> - r5l_write_super(log, ctx.pos);
> >> + r5l_write_super(log, pos);
> >> return 0;
> >> }
> >
> > Applied the first 3 patches in the series. This one looks good too, but why we
> > always create the empty meta block? It's only required when we don't rewrite
> > the data. Eg, the data_only_stripes == 0.
> >
> > Thanks,
> > Shaohua
>
> As you said, when data_only_stripes != 0, does not need to write an empty
> meta block, but we need to calculate the position of the first list member and
> save it. at the same time, when data_only_stripes == 0, then you need to write
> an empty block, and let the super block pointing to him; In any case, Since
> there is a possibility that invalid blocks are connected to valid blocks, we still
> need to add 10 to them.
>
> In my option, if this empty block has been added, we just let the super block
> pointing to him, everything is OK now, the code is more clean, and the logic
> is clear.
I'd prefer not writting the empty block unconditionally. It's unnecessary and
confusing reading the code. Don't think a 'if () write_empty_block' makes the
logic complecated.
Thanks,
Shaohua
^ permalink raw reply
* Re: MD Remnants After --stop
From: Marc Smith @ 2016-12-02 20:22 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
In-Reply-To: <CAHkw+LdhCOGn4ZXOrWMQHFK7RvJ9MA=0_uoN+69ZC3SUWsGaCg@mail.gmail.com>
Finally, I got it! Why is it when I want it to break, it doesn't. =)
I will say, using the modified mdadm that prevents the synthesized
CHANGE event, it seems to not induce the problem as regularly.
Below are the kernel logs after stopping an array:
--snip--
[16438.999544] ------------[ cut here ]------------
[16438.999554] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16438.999567] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999592] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16438.999593] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999595] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999598] ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16438.999601] 000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16438.999604] Call Trace:
[16438.999612] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999618] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999620] [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16438.999623] [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16438.999628] [<ffffffff811513c3>] ? __blkdev_get+0xc3/0x345
[16438.999634] [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16438.999641] [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16438.999647] [<ffffffff81132a3f>] ? walk_component+0x36/0x20f
[16438.999653] [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16438.999659] [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16438.999666] [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16438.999672] [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16438.999677] [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16438.999680] [<ffffffff8113a058>] ? dput+0x21/0x1cb
[16438.999683] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16438.999685] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16438.999690] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999692] ---[ end trace a0868b86aec8f14c ]---
[16438.999743] ------------[ cut here ]------------
[16438.999747] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16438.999748] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999763] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16438.999764] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999765] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999768] ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fb80
[16438.999771] ffff8803182fc000 ffffc9000ff17f30 ffff8805c7871998
ffffffff818792a0
[16438.999774] Call Trace:
[16438.999777] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999780] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999782] [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16438.999787] [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16438.999789] [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16438.999792] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16438.999795] [<ffffffff811840ca>] ? kernfs_iop_get_link+0x14b/0x17b
[16438.999798] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16438.999800] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16438.999803] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999804] ---[ end trace a0868b86aec8f14d ]---
[16438.999806] ------------[ cut here ]------------
[16438.999809] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16438.999809] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999825] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16438.999826] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999827] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999830] ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fb80
[16438.999833] ffff8803182fc000 ffffc9000ff17f30 ffff8805c7871998
ffffffff818781b3
[16438.999836] Call Trace:
[16438.999838] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999840] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999843] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16438.999845] [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16438.999847] [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16438.999850] [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16438.999852] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16438.999855] [<ffffffff811840ca>] ? kernfs_iop_get_link+0x14b/0x17b
[16438.999857] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16438.999860] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16438.999862] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999864] ---[ end trace a0868b86aec8f14e ]---
[16438.999864] mddev->active = 2
[16438.999884] ------------[ cut here ]------------
[16438.999887] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16438.999887] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999902] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16438.999903] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999904] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999907] ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fd10
[16438.999910] ffff8805c8def000 ffff8805c96b7e00 0000000000000001
ffffffff818792a0
[16438.999913] Call Trace:
[16438.999916] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999918] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999920] [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16438.999923] [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16438.999927] [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16438.999929] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16438.999931] [<ffffffff8113a067>] ? dput+0x30/0x1cb
[16438.999934] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16438.999936] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16438.999939] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999941] ---[ end trace a0868b86aec8f14f ]---
[16438.999942] ------------[ cut here ]------------
[16438.999945] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16438.999952] Modules linked in:
[16438.999958] fcst(O) scst_changer(O) scst_tape(O) scst_vdisk(O)
scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O) scst(O) qla2xxx
bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs ib_srp iw_nes
iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000060] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.000061] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000061] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000064] ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fd10
[16439.000067] ffff8805c8def000 ffff8805c96b7e00 0000000000000001
ffffffff818781b3
[16439.000070] Call Trace:
[16439.000073] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000075] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000077] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.000080] [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.000082] [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16439.000085] [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16439.000087] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.000089] [<ffffffff8113a067>] ? dput+0x30/0x1cb
[16439.000092] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.000095] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.000097] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000099] ---[ end trace a0868b86aec8f150 ]---
[16439.000099] mddev->active = 2
[16439.000111] ------------[ cut here ]------------
[16439.000114] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16439.000115] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000130] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.000131] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000132] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000135] ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fba0
[16439.000138] ffff8805c8def000 ffff8805c96b7100 0000000000000001
ffffffff818792a0
[16439.000141] Call Trace:
[16439.000144] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000146] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000148] [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16439.000151] [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16439.000153] [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16439.000156] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.000158] [<ffffffff8113a166>] ? dput+0x12f/0x1cb
[16439.000161] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.000163] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.000166] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000167] ---[ end trace a0868b86aec8f151 ]---
[16439.000169] ------------[ cut here ]------------
[16439.000171] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.000172] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000187] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.000188] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000189] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000192] ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fba0
[16439.000195] ffff8805c8def000 ffff8805c96b7100 0000000000000001
ffffffff818781b3
[16439.000198] Call Trace:
[16439.000201] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000203] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000205] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.000207] [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.000210] [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16439.000212] [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16439.000215] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.000217] [<ffffffff8113a166>] ? dput+0x12f/0x1cb
[16439.000219] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.000222] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.000224] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000226] ---[ end trace a0868b86aec8f152 ]---
[16439.000227] mddev->active = 2
[16439.000236] ------------[ cut here ]------------
[16439.000239] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.000240] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000255] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.000256] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000257] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000260] ffffffff81065522 ffff8803182d4000 ffff880622d20aa0
ffff8803182ae800
[16439.000263] ffff880622d209d8 ffff880622d20b20 ffff88030f854ac8
ffffffff818781b3
[16439.000266] Call Trace:
[16439.000269] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000271] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000273] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.000276] [<ffffffff81151258>] ? __blkdev_put+0x11c/0x1c4
[16439.000278] [<ffffffff81151aff>] ? blkdev_close+0x1c/0x1f
[16439.000280] [<ffffffff81129c69>] ? __fput+0xd8/0x18a
[16439.000285] [<ffffffff8107990c>] ? task_work_run+0x5d/0x73
[16439.000288] [<ffffffff81001048>] ? exit_to_usermode_loop+0x48/0x5d
[16439.000290] [<ffffffff8100135c>] ? syscall_return_slowpath+0x3a/0x4c
[16439.000292] [<ffffffff81a7da9f>] ? entry_SYSCALL_64_fastpath+0x92/0x94
[16439.000294] ---[ end trace a0868b86aec8f153 ]---
[16439.000295] mddev->active = 1
[16439.000296] rd=2 empty=0 ctime=1480694644 hold=0
[16439.000302] ------------[ cut here ]------------
[16439.000305] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16439.000305] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000321] CPU: 4 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.000322] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000323] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000326] ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16439.000329] 000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16439.000332] Call Trace:
[16439.000334] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000336] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000338] [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16439.000341] [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16439.000343] [<ffffffff811513c3>] ? __blkdev_get+0xc3/0x345
[16439.000345] [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16439.000348] [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16439.000350] [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.000353] [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16439.000355] [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16439.000359] [<ffffffff8109b30f>] ? console_unlock+0x254/0x46c
[16439.000362] [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16439.000364] [<ffffffff8113a058>] ? dput+0x21/0x1cb
[16439.000367] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000369] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000372] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000374] ---[ end trace a0868b86aec8f154 ]---
[16439.000405] udevd[494]: inotify event: 8 for /dev/md126
[16439.000420] ------------[ cut here ]------------
[16439.000427] WARNING: CPU: 11 PID: 494 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16439.000427] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000449] CPU: 11 PID: 494 Comm: udevd Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.000450] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000452] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000455] ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16439.000458] 000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16439.000461] Call Trace:
[16439.000467] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000471] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000474] [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16439.000476] [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16439.000480] [<ffffffff81151522>] ? __blkdev_get+0x222/0x345
[16439.000483] [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.000485] [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16439.000488] [<ffffffff81132aab>] ? walk_component+0xa2/0x20f
[16439.000490] [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16439.000493] [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.000496] [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16439.000498] [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16439.000500] [<ffffffff81132400>] ? lookup_fast+0x1c0/0x267
[16439.000503] [<ffffffff8113a067>] ? dput+0x30/0x1cb
[16439.000505] [<ffffffff8113316c>] ? path_lookupat+0xea/0xfe
[16439.000507] [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16439.000510] [<ffffffff8113cf32>] ? current_time+0x54/0x5d
[16439.000514] [<ffffffff811840ca>] ? kernfs_iop_get_link+0x14b/0x17b
[16439.000516] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000518] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000523] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000524] ---[ end trace a0868b86aec8f155 ]---
[16439.009255] md126: detected capacity change from 73340747776 to 0
[16439.009259] md: md126 stopped.
[16439.009419] dlm: d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a: leaving the
lockspace group...
[16439.009424] udevd[494]: device /dev/md126 closed, synthesising 'change'
[16439.009512] udevd[494]: seq 3817 queued, 'offline' 'dlm'
[16439.009709] udevd[494]: seq 3817 forked new worker [31176]
[16439.009762] udevd[494]: seq 3818 queued, 'change' 'block'
[16439.009882] udevd[494]: seq 3818 forked new worker [31177]
[16439.009904] udevd[31176]: seq 3817 running
[16439.009955] udevd[31176]: no db file to read
/run/udev/data/+dlm:d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a: No such file
or directory
[16439.010002] udevd[31176]: passed device to netlink monitor 0xdee2c0
[16439.010005] udevd[31176]: seq 3817 processed
[16439.010255] ------------[ cut here ]------------
[16439.010260] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16439.010261] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010280] CPU: 20 PID: 31177 Comm: udevd Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.010281] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010282] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010286] ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fb80
[16439.010289] ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818792a0
[16439.010292] Call Trace:
[16439.010296] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010299] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010301] [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16439.010304] [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010306] [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010309] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010312] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010315] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010318] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010319] ---[ end trace a0868b86aec8f156 ]---
[16439.010321] ------------[ cut here ]------------
[16439.010324] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.010324] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010340] CPU: 20 PID: 31177 Comm: udevd Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.010341] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010342] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010345] ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fb80
[16439.010348] ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818781b3
[16439.010351] Call Trace:
[16439.010354] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010356] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010358] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.010360] [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.010363] [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010365] [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010368] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010370] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010373] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010375] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010377] ---[ end trace a0868b86aec8f157 ]---
[16439.010377] mddev->active = 3
[16439.010385] dlm: d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a: group event done 0 0
[16439.010399] ------------[ cut here ]------------
[16439.010402] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16439.010403] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010419] CPU: 20 PID: 31177 Comm: udevd Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.010420] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010421] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010424] ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fc20
[16439.010426] ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818792a0
[16439.010429] Call Trace:
[16439.010432] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010434] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010436] [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16439.010439] [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010441] [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010444] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010447] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010449] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010452] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010453] ---[ end trace a0868b86aec8f158 ]---
[16439.010455] ------------[ cut here ]------------
[16439.010457] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.010458] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010473] CPU: 20 PID: 31177 Comm: udevd Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.010474] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010475] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010478] ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fc20
[16439.010480] ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818781b3
[16439.010483] Call Trace:
[16439.010486] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010488] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010491] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.010493] [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.010495] [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010498] [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010501] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010503] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010506] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010508] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010510] ---[ end trace a0868b86aec8f159 ]---
[16439.010511] mddev->active = 3
[16439.010600] dlm: d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a:
release_lockspace final free
[16439.010628] md: unbind<dm-2>
[16439.011727] ------------[ cut here ]------------
[16439.011732] WARNING: CPU: 12 PID: 31178 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16439.011733] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.011751] CPU: 12 PID: 31178 Comm: probe-bcache Tainted: G
W O 4.9.0-rc3-esos.prod #1
[16439.011752] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.011753] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.011757] ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16439.011760] 000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16439.011763] Call Trace:
[16439.011767] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.011769] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.011772] [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16439.011774] [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16439.011777] [<ffffffff81151522>] ? __blkdev_get+0x222/0x345
[16439.011779] [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.011781] [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16439.011784] [<ffffffff81132aab>] ? walk_component+0xa2/0x20f
[16439.011789] [<ffffffff810ebf11>] ? get_page_from_freelist+0x58f/0x6dc
[16439.011791] [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16439.011794] [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.011796] [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16439.011798] [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16439.011801] [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16439.011806] [<ffffffff81105e6a>] ? handle_mm_fault+0x607/0xb0e
[16439.011809] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.011811] [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.011814] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.011815] ---[ end trace a0868b86aec8f15a ]---
[16439.024630] md: export_rdev(dm-2)
[16439.024689] md: unbind<dm-3>
[16439.035626] md: export_rdev(dm-3)
[16439.035781] ------------[ cut here ]------------
[16439.035786] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:449
md_seq_next+0x5b/0x93
[16439.035787] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.035805] CPU: 11 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.035806] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.035808] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.035811] ffffffff81065522 0000000000000001 ffff88061a157000
ffff88061a1573b8
[16439.035814] ffff8803133adb00 ffff880320a15000 000000000000004b
ffffffff81879329
[16439.035817] Call Trace:
[16439.035821] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.035823] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.035826] [<ffffffff81879329>] ? md_seq_next+0x5b/0x93
[16439.035829] [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.035832] [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.035834] [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.035837] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.035840] [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.035842] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.035845] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.035848] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.035850] ---[ end trace a0868b86aec8f15b ]---
[16439.035858] ------------[ cut here ]------------
[16439.035861] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:449
md_seq_next+0x5b/0x93
[16439.035861] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.035877] CPU: 11 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.035878] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.035879] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.035882] ffffffff81065522 ffff88061a157000 ffff8803182d5000
ffff8803182d53b8
[16439.035885] ffff8803133adb00 ffff880320a15000 00000000000000c9
ffffffff81879329
[16439.035888] Call Trace:
[16439.035891] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.035893] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.035895] [<ffffffff81879329>] ? md_seq_next+0x5b/0x93
[16439.035897] [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.035900] [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.035902] [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.035904] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.035907] [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.035909] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.035911] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.035914] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.035915] ---[ end trace a0868b86aec8f15c ]---
[16439.035916] ------------[ cut here ]------------
[16439.035919] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.035920] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.035935] CPU: 11 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.035936] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.035936] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.035940] ffffffff81065522 ffff88061a157000 ffff8803182d5000
ffff8803182d53b8
[16439.035942] ffff8803133adb00 ffff880320a15000 00000000000000c9
ffffffff818781b3
[16439.035946] Call Trace:
[16439.035948] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.035950] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.035953] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.035955] [<ffffffff81879359>] ? md_seq_next+0x8b/0x93
[16439.035957] [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.035960] [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.035961] [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.035964] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.035966] [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.035969] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.035971] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.035974] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.035975] ---[ end trace a0868b86aec8f15d ]---
[16439.035976] mddev->active = 1
[16439.035978] rd=2 empty=0 ctime=1480694775 hold=0
[16439.035984] ------------[ cut here ]------------
[16439.035987] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:449
md_seq_next+0x5b/0x93
[16439.035988] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036003] CPU: 11 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.036004] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036005] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036008] ffffffff81065522 ffff8803182d5000 ffff8803182d4000
ffff8803182d43b8
[16439.036011] ffff8803133adb00 ffff880320a15000 0000000000000147
ffffffff81879329
[16439.036014] Call Trace:
[16439.036016] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036018] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036021] [<ffffffff81879329>] ? md_seq_next+0x5b/0x93
[16439.036023] [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.036025] [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.036027] [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.036030] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.036032] [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.036034] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.036037] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.036039] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.036041] ---[ end trace a0868b86aec8f15e ]---
[16439.036042] ------------[ cut here ]------------
[16439.036044] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.036045] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036060] CPU: 11 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.036061] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036062] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036065] ffffffff81065522 ffff8803182d5000 ffff8803182d4000
ffff8803182d43b8
[16439.036067] ffff8803133adb00 ffff880320a15000 0000000000000147
ffffffff818781b3
[16439.036070] Call Trace:
[16439.036073] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036075] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036077] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.036079] [<ffffffff81879359>] ? md_seq_next+0x8b/0x93
[16439.036082] [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.036084] [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.036086] [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.036088] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.036091] [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.036093] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.036096] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.036098] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.036100] ---[ end trace a0868b86aec8f15f ]---
[16439.036100] mddev->active = 1
[16439.036102] rd=2 empty=0 ctime=1480694673 hold=0
[16439.036103] ------------[ cut here ]------------
[16439.036105] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.036106] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036121] CPU: 11 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.036122] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036122] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036125] ffffffff81065522 ffff8803182d4000 0000000000000002
ffffffff8209ff30
[16439.036128] ffff8803133adb00 ffff880320a15000 0000000000000147
ffffffff818781b3
[16439.036131] Call Trace:
[16439.036134] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036136] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036138] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.036140] [<ffffffff81879359>] ? md_seq_next+0x8b/0x93
[16439.036143] [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.036145] [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.036147] [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.036150] [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.036152] [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.036154] [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.036157] [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.036160] [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.036161] ---[ end trace a0868b86aec8f160 ]---
[16439.036162] mddev->active = 4
[16439.036298] ------------[ cut here ]------------
[16439.036302] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.036303] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036319] CPU: 11 PID: 31175 Comm: mdadm Tainted: G W O
4.9.0-rc3-esos.prod #1
[16439.036320] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036321] 0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036324] ffffffff81065522 ffff8803182d4000 ffff880622d20aa0
ffff8803182ae800
[16439.036327] ffff880622d209d8 ffff880622d20b20 ffff88030f854ac8
ffffffff818781b3
[16439.036330] Call Trace:
[16439.036334] [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036336] [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036339] [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.036341] [<ffffffff81151258>] ? __blkdev_put+0x11c/0x1c4
[16439.036344] [<ffffffff81151aff>] ? blkdev_close+0x1c/0x1f
[16439.036346] [<ffffffff81129c69>] ? __fput+0xd8/0x18a
[16439.036350] [<ffffffff8107990c>] ? task_work_run+0x5d/0x73
[16439.036352] [<ffffffff81001048>] ? exit_to_usermode_loop+0x48/0x5d
[16439.036354] [<ffffffff8100135c>] ? syscall_return_slowpath+0x3a/0x4c
[16439.036357] [<ffffffff81a7da9f>] ? entry_SYSCALL_64_fastpath+0x92/0x94
[16439.036359] ---[ end trace a0868b86aec8f161 ]---
[16439.036360] mddev->active = 3
--snip--
--Marc
On Fri, Dec 2, 2016 at 2:12 PM, Marc Smith <marc.smith@mcc.edu> wrote:
> On Thu, Dec 1, 2016 at 5:35 PM, NeilBrown <neilb@suse.com> wrote:
>> On Fri, Dec 02 2016, Marc Smith wrote:
>>
>>> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>>>> On Mon, Nov 28 2016, Marc Smith wrote:
>>>>
>>>>>
>>>>> # find /sys/block/md127/md
>>>>> /sys/block/md127/md
>>>>> /sys/block/md127/md/reshape_position
>>>>> /sys/block/md127/md/layout
>>>>> /sys/block/md127/md/raid_disks
>>>>> /sys/block/md127/md/bitmap
>>>>> /sys/block/md127/md/bitmap/chunksize
>>>>
>>>> This tells me that:
>>>> sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>>>> hasn't been run, so mddev_delayed_delete() hasn't run.
>>>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>>>
>>>> Everything else suggests that array has been stopped and cleaned and
>>>> should be gone...
>>>>
>>>> This seems to suggest that there is an unbalanced mddev_get() without a
>>>> matching mddev_put(). I cannot find it though.
>>>>
>>>> If I could reproduce it, I would try to see what is happening by:
>>>>
>>>> - putting
>>>> printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>>> in the top of mddev_put(). That shouldn't be *too* noisy.
>>>>
>>>> - putting
>>>> printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>>> list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>>>
>>>> in mddev_put() just before those values are tested.
>>>>
>>>> - putting
>>>> printk("queue_work\n");
>>>> just before the 'queue_work()' call in mddev_put.
>>>>
>>>> - putting
>>>> printk("mddev_delayed_delete\n");
>>>> in mddev_delayed_delete()
>>>>
>>>> Then see what gets printed when you stop the array.
>>>
>>> I made those modifications to md.c and here is the kernel log when stopping:
>>>
>>> --snip--
>>> [ 3937.233487] mddev->active = 2
>>> [ 3937.233503] mddev->active = 2
>>> [ 3937.233509] mddev->active = 2
>>> [ 3937.233516] mddev->active = 1
>>> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
>>
>> At this point, mdadm has opened the /dev/md127 device, accessed a few
>> attributes via sysfs just to check on the status, and then closed it
>> again.
>> The array is still active, but we know that no other process has it
>> open.
>>
>>
>>> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
>>> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
>>> [ 3937.241493] md: md127 stopped.
>>
>> Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
>> Still nothing else has the array open.
>>
>>> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
>>> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
>>> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
>>> [ 3937.241989] udevd[4991]: seq 3631 running
>>> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
>>> lockspace group...
>>> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
>>> [ 3937.242068] mddev->active = 3
>>
>> But somehow the ->active count got up to 3.
>> mdadm probably still has it open, but two other things do too.
>> If you have "mdadm --monitor" running in the background (which is good)
>> it will temporarily increase, then decrease the count.
>> udevd opens the device temporarily too.
>> So this isn't necessarily a problem.
>>
>>> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
>>> [ 3937.242080] mddev->active = 3
>>> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
>>> /usr/lib/udev/rules.d/69-bcache.rules:16
>>> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
>>> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
>>> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
>>> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
>>> release_lockspace final free
>>> [ 3937.242861] md: unbind<dm-1>
>>> [ 3937.256606] md: export_rdev(dm-1)
>>> [ 3937.256612] md: unbind<dm-0>
>>> [ 3937.263601] md: export_rdev(dm-0)
>>> [ 3937.263688] mddev->active = 4
>>> [ 3937.263751] mddev->active = 3
>>
>> But here, the active count only drops down to 2. (it is decremented
>> after it is printed). Assuming there really were no more messages like
>> this, there are two active references to the md device, and we don't
>> know what they are.
>>
>>>
>>> I didn't use my modified mdadm which stops the synthesized CHANGE from
>>> occurring, but if needed, I can re-run the test using that.
>>
>> It would be good to use the modified mdadm, if only to reduce the
>> noise. It won't change the end result, but might make it easier to see
>> what is happening.
>> Also please add
>> WARN_ON(1);
>>
>> in the start of mddev_get() and mddev_put().
>> That will provide a stack trace whenever either of these are called, so
>> we can see who takes a references, and who doesn't release it.
>
> Okay, I added that to both functions, and now I can't get stopping the
> array to misbehave (eg, not generate the REMOVE event). I've been
> trying all morning! I literally just added the WARN_ON(1) to those two
> functions, and that's all I changed. I compiled and reinstalled image,
> no other changes. I've tried quite a few times now to reproduce this,
> and I'm failing to do so -- every time the REMOVE event is generated
> and everything is removed correctly.
>
> I'm going to switch back to the previous image and confirm its
> reproducible with that.
>
> --Marc
>
>
>>
>> Thanks,
>> NeilBrown
>>
^ permalink raw reply
* Re: [PATCH v2 2/2] md/raid10: Refactor raid10_make_request
From: Shaohua Li @ 2016-12-02 22:02 UTC (permalink / raw)
To: Robert LeBlanc; +Cc: linux-raid
In-Reply-To: <20161202033008.30314-3-robert@leblancnet.us>
On Thu, Dec 01, 2016 at 08:30:08PM -0700, Robert LeBlanc wrote:
> Refactor raid10_make_request into seperate read and write functions to
> clean up the code.
>
> Signed-off-by: Robert LeBlanc <robert@leblancnet.us>
> ---
Hi,
could you please resend the patches against my for-next branch? The two patches
don't apply.
> int bad_sectors;
> int is_bad;
>
> - is_bad = is_badblock(rdev, dev_sector,
> - max_sectors,
> + is_bad = is_badblock(rdev, dev_sector, max_sectors,
> &first_bad, &bad_sectors);
> if (is_bad < 0) {
> /* Mustn't write here until the bad block
> @@ -1353,8 +1291,7 @@ retry_write:
> r10_bio->devs[i].bio = mbio;
>
> mbio->bi_iter.bi_sector = (r10_bio->devs[i].addr+
> - choose_data_offset(r10_bio,
> - rdev));
> + choose_data_offset(r10_bio, rdev));
> mbio->bi_bdev = rdev->bdev;
> mbio->bi_end_io = raid10_end_write_request;
> bio_set_op_attrs(mbio, op, do_sync | do_fua);
> @@ -1395,8 +1332,7 @@ retry_write:
> r10_bio->devs[i].repl_bio = mbio;
>
> mbio->bi_iter.bi_sector = (r10_bio->devs[i].addr +
> - choose_data_offset(
> - r10_bio, rdev));
> + choose_data_offset(r10_bio, rdev));
> mbio->bi_bdev = rdev->bdev;
> mbio->bi_end_io = raid10_end_write_request;
> bio_set_op_attrs(mbio, op, do_sync | do_fua);
> @@ -1434,6 +1370,77 @@ retry_write:
> one_write_done(r10_bio);
> }
>
> +static void __make_request(struct mddev *mddev, struct bio *bio)
> +{
> + struct r10conf *conf = mddev->private;
> + struct r10bio *r10_bio;
> + int sectors;
> +
we do wait_barrier before md_write_start now. I'm not confortable with this.
Could you please add md_write_start here? A single line of code for write here
doesn't matter.
> + /*
> + * Register the new request and wait if the reconstruction
> + * thread has put up a bar for new requests.
> + * Continue immediately if no resync is active currently.
> + */
> + wait_barrier(conf);
> +
> + sectors = bio_sectors(bio);
> + while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> + bio->bi_iter.bi_sector < conf->reshape_progress &&
> + bio->bi_iter.bi_sector + sectors > conf->reshape_progress) {
> + /* IO spans the reshape position. Need to wait for
> + * reshape to pass
> + */
> + allow_barrier(conf);
> + wait_event(conf->wait_barrier,
> + conf->reshape_progress <= bio->bi_iter.bi_sector ||
> + conf->reshape_progress >= bio->bi_iter.bi_sector +
> + sectors);
> + wait_barrier(conf);
> + }
> + if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> + bio_data_dir(bio) == WRITE &&
> + (mddev->reshape_backwards
> + ? (bio->bi_iter.bi_sector < conf->reshape_safe &&
> + bio->bi_iter.bi_sector + sectors > conf->reshape_progress)
> + : (bio->bi_iter.bi_sector + sectors > conf->reshape_safe &&
> + bio->bi_iter.bi_sector < conf->reshape_progress))) {
> + /* Need to update reshape_position in metadata */
> + mddev->reshape_position = conf->reshape_progress;
> + set_mask_bits(&mddev->flags, 0,
> + BIT(MD_CHANGE_DEVS) | BIT(MD_CHANGE_PENDING));
> + md_wakeup_thread(mddev->thread);
> + wait_event(mddev->sb_wait,
> + !test_bit(MD_CHANGE_PENDING, &mddev->flags));
> +
> + conf->reshape_safe = mddev->reshape_position;
> + }
this is write only and could be moved to write path.
> +
> + r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
> +
> + r10_bio->master_bio = bio;
> + r10_bio->sectors = sectors;
> +
> + r10_bio->mddev = mddev;
> + r10_bio->sector = bio->bi_iter.bi_sector;
> + r10_bio->state = 0;
> +
> + /* We might need to issue multiple reads to different
> + * devices if there are bad blocks around, so we keep
> + * track of the number of reads in bio->bi_phys_segments.
> + * If this is 0, there is only one r10_bio and no locking
> + * will be needed when the request completes. If it is
> + * non-zero, then it is the number of not-completed requests.
> + */
> + bio->bi_phys_segments = 0;
> + bio_clear_flag(bio, BIO_SEG_VALID);
> +
> + if (bio_data_dir(bio) == READ) {
> + raid10_read_request(mddev, bio, r10_bio);
> + return;
> + }
Better do the same as raid1 here.
> + raid10_write_request(mddev, bio, r10_bio);
> +}
Thanks,
Shaohua
^ permalink raw reply
* [PATCH 1/2] md/r5cache: do r5c_update_log_state after log recovery
From: Zhengyuan Liu @ 2016-12-04 8:49 UTC (permalink / raw)
To: linux-raid
We should update log state after we did a log recovery, current completion
may get wrong log state since log->log_start wasn't initalized until we
called r5l_recovery_log.
At log recovery stage, no lock needed as there is no race conditon.
next_checkpoint field will be initialized in r5l_recovery_log too.
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
---
drivers/md/raid5-cache.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index fa3319c..07bce0e 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2522,14 +2522,12 @@ static int r5l_load_log(struct r5l_log *log)
if (log->max_free_space > RECLAIM_MAX_FREE_SPACE)
log->max_free_space = RECLAIM_MAX_FREE_SPACE;
log->last_checkpoint = cp;
- log->next_checkpoint = cp;
- mutex_lock(&log->io_mutex);
- r5c_update_log_state(log);
- mutex_unlock(&log->io_mutex);
__free_page(page);
- return r5l_recovery_log(log);
+ ret = r5l_recovery_log(log);
+ r5c_update_log_state(log);
+ return ret;
ioerr:
__free_page(page);
return ret;
--
2.7.4
^ permalink raw reply related
* [PATCH 2/2] md/r5cache: set journal mode according to log content
From: Zhengyuan Liu @ 2016-12-04 8:49 UTC (permalink / raw)
To: linux-raid
In-Reply-To: <1480841385-21180-1-git-send-email-liuzhengyuan@kylinos.cn>
Currently, we choice write-through mode as the default journal mode.
If there is data-only stripes, we'd rewrite it and add it into raid5d
release list. However, raid5d thread wouldn't put those stripes into
cache(full/partial) list but inactive list instead since the journal
mode is write-through. More futher, later read request would get data
from raid disk directly instead of cache stripe, that's not we want too.
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
---
drivers/md/raid5-cache.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 07bce0e..0473b33 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2606,7 +2606,6 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
INIT_WORK(&log->deferred_io_work, r5l_submit_io_async);
- log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
INIT_LIST_HEAD(&log->stripe_in_journal_list);
spin_lock_init(&log->stripe_in_journal_lock);
atomic_set(&log->stripe_in_journal_count, 0);
@@ -2614,6 +2613,11 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
if (r5l_load_log(log))
goto error;
+ if (log->last_checkpoint == log->next_checkpoint)
+ log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
+ else
+ log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_BACK;
+
rcu_assign_pointer(conf->log, log);
set_bit(MD_HAS_JOURNAL, &conf->mddev->flags);
return 0;
--
2.7.4
^ permalink raw reply related
* Re: Feature request, resumable raid check action
From: Mikael Abrahamsson @ 2016-12-04 14:26 UTC (permalink / raw)
To: NeilBrown; +Cc: Patrick Dung, linux-raid
In-Reply-To: <87y3zy6evp.fsf@notabene.neil.brown.name>
On Fri, 2 Dec 2016, NeilBrown wrote:
> Did you reboot while it was running, or after it had finished its
> aloted time?
> You need to let it finish and record where it got up to.
I think he wants this "record where it got up to" when md-volumes are
stopped when machine is rebooted, and then resumed again when machine
starts up again.
I guess next question is where a change like this should go, into
start/stop scripts, mdadm or the kernel?
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply
* Re: [PATCH 2/2] md/r5cache: set journal mode according to log content
From: Song Liu @ 2016-12-04 20:11 UTC (permalink / raw)
To: Zhengyuan Liu; +Cc: Shaohua Li, JackieLiu, linux-raid@vger.kernel.org
In-Reply-To: <1480841097-21018-2-git-send-email-liuzhengyuan@kylinos.cn>
I noticed this problem. This patch alone is not enough to fix it. I will send my patch
for this soon.
Thanks,
Song
> On Dec 4, 2016, at 12:44 AM, Zhengyuan Liu <liuzhengyuan@kylinos.cn> wrote:
>
> Currently, we choice write-through mode as the default journal mode.
> If there is data-only stripes, we'd rewrite it and add it into raid5d
> release list. However, raid5d thread wouldn't put those stripes into
> cache(full/partial) list but inactive list instead since the journal
> mode is write-through. More futher, later read request would get data
> from raid disk directly instead of cache stripe, that's not we want too.
>
> Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
> ---
> drivers/md/raid5-cache.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index 07bce0e..0473b33 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -2606,7 +2606,6 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
>
> INIT_WORK(&log->deferred_io_work, r5l_submit_io_async);
>
> - log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
> INIT_LIST_HEAD(&log->stripe_in_journal_list);
> spin_lock_init(&log->stripe_in_journal_lock);
> atomic_set(&log->stripe_in_journal_count, 0);
> @@ -2614,6 +2613,11 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
> if (r5l_load_log(log))
> goto error;
>
> + if (log->last_checkpoint == log->next_checkpoint)
> + log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
> + else
> + log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_BACK;
> +
> rcu_assign_pointer(conf->log, log);
> set_bit(MD_HAS_JOURNAL, &conf->mddev->flags);
> return 0;
> --
> 2.7.4
>
>
>
^ permalink raw reply
* Re: Feature request, resumable raid check action
From: NeilBrown @ 2016-12-05 0:00 UTC (permalink / raw)
To: Patrick Dung; +Cc: linux-raid
In-Reply-To: <CAEtPA0CMi6PiU1VC+3h78Wnr=t-ufkT3fz=tAT0hW0jVrAp5tQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 829 bytes --]
(linux-raid added back to cc:)
On Fri, Dec 02 2016, Patrick Dung wrote:
> I rebooted while it was running.
> For a new RAID, the initial sync or resync would resume after reboot.
> I thought check would have the same semantic.
>
> If I am correct, the mdcheck script is designed for a 7x24 system.
> Users can pause the 'check' action and resume later.
>
> The problem is that I could not leave my computer running on 7x24.
> A full check may take about 12 hours for a 6TB RAID1.
>
> Ok, I recheck the script and found 'sync_min'
> I can use it to manually perform resumable check action.
Correct.
You could possibly even modify mdcheck to catch SIGTERM and close down
the resync early, remembering where it is up. This might work well with
reboots.
NeilBrown
> Reference: https://www.kernel.org/doc/Documentation/md.txt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply
* Re: MD Remnants After --stop
From: NeilBrown @ 2016-12-05 0:41 UTC (permalink / raw)
To: Marc Smith; +Cc: linux-raid
In-Reply-To: <CAHkw+Lfncgkdb8HnbtfGaX0P_KmLmicj3E64JK7oY+1Pv6g_iw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]
On Sat, Dec 03 2016, Marc Smith wrote:
> Finally, I got it! Why is it when I want it to break, it doesn't. =)
welcome to my world :-)
>
> I will say, using the modified mdadm that prevents the synthesized
> CHANGE event, it seems to not induce the problem as regularly.
>
> Below are the kernel logs after stopping an array:
Thank you so much for persisting with this.
The logs you provide make it clear that two separate processes (494 and
31178) increment the ->active count by opening the device, but never
decrement that count by closing the device.
It seems too unlikely that either process would be holding the
file descriptor open indefinitely, so something must be going wrong
either as part of 'open', or as part of 'close'.
Now that I know where to look, the bug is obvious. Why didn't I see
that before?
The open request is failing, almost certainly because MD_CLOSING is set,
but the ->active count isn't being decremented on failure.
This patch should fix it.
Please test and report results.
Thanks,
NeilBrown
Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag" v4.9-rc1)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2089d46b0eb8..a8e07eb2ca5f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7087,11 +7087,14 @@ static int md_open(struct block_device *bdev, fmode_t mode)
}
BUG_ON(mddev != bdev->bd_disk->private_data);
- if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
+ if ((err = mutex_lock_interruptible(&mddev->open_mutex))) {
+ mddev_put(mddev);
goto out;
+ }
if (test_bit(MD_CLOSING, &mddev->flags)) {
mutex_unlock(&mddev->open_mutex);
+ mddev_put(mddev);
return -ENODEV;
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox