Moving existing internal journal log to an external device (success?)

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Moving existing internal journal log to an external device (success?)
@ 2023-08-20 19:37 fk1xdcio
  2023-08-20 22:14 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: fk1xdcio @ 2023-08-20 19:37 UTC (permalink / raw)
  To: linux-xfs

Does this look like a sane method for moving an existing internal log to 
an external device?

3 drives:
    /dev/nvme0n1p1  2GB  Journal mirror 0
    /dev/nvme1n1p1  2GB  Journal mirror 1
    /dev/sda1       16TB XFS

# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/nvme0n1p1 
/dev/nvme1n1p2
# mkfs.xfs /dev/sda1
# xfs_logprint -C journal.bin /dev/sda1
# cat journal.bin > /dev/md0
# xfs_db -x /dev/sda1

xfs_db> sb
xfs_db> write -d logstart 0
xfs_db> quit

# mount -o logdev=/dev/md0 /dev/sda1 /mnt

-------------------------

It seems to "work" and I tested with a whole bunch of data. I was also 
able to move the log back to internal without issue (set logstart back 
to what it was originally). I don't know enough about how the filesystem 
layout works to know if this will eventually break.

*IF* this works, why can't xfs_growfs do it?

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Moving existing internal journal log to an external device (success?)
  2023-08-20 19:37 Moving existing internal journal log to an external device (success?) fk1xdcio
@ 2023-08-20 22:14 ` Dave Chinner
       [not found]   ` <B4C72D86-4CD6-415D-802E-7A225C868E57.1@smtp-inbound1.duck.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2023-08-20 22:14 UTC (permalink / raw)
  To: fk1xdcio; +Cc: linux-xfs

On Sun, Aug 20, 2023 at 03:37:38PM -0400, fk1xdcio@duck.com wrote:
> Does this look like a sane method for moving an existing internal log to an
> external device?
> 
> 3 drives:
>    /dev/nvme0n1p1  2GB  Journal mirror 0
>    /dev/nvme1n1p1  2GB  Journal mirror 1
>    /dev/sda1       16TB XFS
> 
> # mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/nvme0n1p1
> /dev/nvme1n1p2
> # mkfs.xfs /dev/sda1
> # xfs_logprint -C journal.bin /dev/sda1
> # cat journal.bin > /dev/md0
> # xfs_db -x /dev/sda1
> 
> xfs_db> sb
> xfs_db> write -d logstart 0
> xfs_db> quit
> 
> # mount -o logdev=/dev/md0 /dev/sda1 /mnt

So you are physically moving the contents of the log whilst the
filesystem is unmounted and unchanging.

> -------------------------
> 
> It seems to "work" and I tested with a whole bunch of data.

You'll get ENOSPC earlier than you think, because you just leaked
the old log space (needs to be marked free space). There might be
other issues, but you get to keep all the broken bits to yourself if
you find them.

You can probably fix that by running xfs_repair, but then....

> I was also able
> to move the log back to internal without issue (set logstart back to what it
> was originally). I don't know enough about how the filesystem layout works
> to know if this will eventually break.

.... this won't work.

i.e. you can move the log back to the original position because you
didn't mark the space the old journal used as free, so the filesytem
still thinks it is in use by something....

> *IF* this works, why can't xfs_growfs do it?

"Doctor, I can perform an amputation with a tornique and a chainsaw,
why can't you do that?"

Mostly you are ignoring the fact that growfs in an online operation
- actually moving the log safely and testing it rigorously is a
whole lot harder to than changing a few fields with xfs_db....

Let's ignore the simple fact we can't tell the kernel to use a
different block device for the log via growfs right now (i.e. needs
a new ioctl interface) and focus on what is involved in moving the
log whilst the filesytem is mounted and actively in use.

First we need an atomic, crash safe mechanism to swap from one log
to another. We need to do that while the filesystem is running, so
it has to be done within a freeze context. Then we have run a
transaction that initialises the new log and tells the old log where
the new log is so that if we crash before the superblock is written
log recovery will replay the log switch. Then we do a sync write of
the superblock so that the next mount will see the new log location.
Then, while the filesystem is still frozen, we have to reconfigure
the in memory log structures to use the new log (e.g. open new
buftarg, update mount pointers to the log device, change the log
state to external, reset log sequence numbers, grant heads, etc).

Finally, if we got everything correct, we then need to free the old
journal in a new transaction running in the new log to clean up the
old journal now that it is no longer in use. Then we can unfreeze
the filesystem...

Yes, you can do amputations with a chainsaw, but it's a method of
last resort that does not guarantee success and you take
responsibility for the results yourself. Turning this into a
reliable procedure that always works or fails safe for all
conditions (professional engineering!) is a whole lot more
complex...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Moving existing internal journal log to an external device (success?)
       [not found]   ` <B4C72D86-4CD6-415D-802E-7A225C868E57.1@smtp-inbound1.duck.com>
@ 2023-08-21 13:07     ` fk1xdcio
  2023-08-24 20:22       ` Eric Sandeen
  0 siblings, 1 reply; 5+ messages in thread
From: fk1xdcio @ 2023-08-21 13:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs@vger.kernel.org

On 2023-08-20 18:14, Dave Chinner wrote:
> On Sun, Aug 20, 2023 at 03:37:38PM -0400, fk1xdcio@duck.com wrote:
>> Does this look like a sane method for moving an existing internal log 
>> to an
>> external device?
>> 
>> 3 drives:
>>    /dev/nvme0n1p1  2GB  Journal mirror 0
>>    /dev/nvme1n1p1  2GB  Journal mirror 1
>>    /dev/sda1       16TB XFS
>> 
>> # mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/nvme0n1p1
>> /dev/nvme1n1p2
>> # mkfs.xfs /dev/sda1
>> # xfs_logprint -C journal.bin /dev/sda1
>> # cat journal.bin > /dev/md0
>> # xfs_db -x /dev/sda1
>> 
>> xfs_db> sb
>> xfs_db> write -d logstart 0
>> xfs_db> quit
>> 
>> # mount -o logdev=/dev/md0 /dev/sda1 /mnt
> 
> So you are physically moving the contents of the log whilst the
> filesystem is unmounted and unchanging.
> 
>> -------------------------
>> 
>> It seems to "work" and I tested with a whole bunch of data.
> 
> You'll get ENOSPC earlier than you think, because you just leaked
> the old log space (needs to be marked free space). There might be
> other issues, but you get to keep all the broken bits to yourself if
> you find them.

It's 2GB out of terabytes so I don't really care about the space but the 
"other issues" is a problem.


> You can probably fix that by running xfs_repair, but then....
> 
>> I was also able
>> to move the log back to internal without issue (set logstart back to 
>> what it
>> was originally). I don't know enough about how the filesystem layout 
>> works
>> to know if this will eventually break.
> 
> .... this won't work.
> 
> i.e. you can move the log back to the original position because you
> didn't mark the space the old journal used as free, so the filesytem
> still thinks it is in use by something....

The space being leaked is fine but xfs_repair is an issue. I did some 
testing and yes, if I run xfs_repair on one of these filesystems with a 
moved log it causes all sorts of problems. In fact it doesn't seem to 
work at all. Big problem.


>> *IF* this works, why can't xfs_growfs do it?
> 
> "Doctor, I can perform an amputation with a tornique and a chainsaw,
> why can't you do that?"
> ,,,
> -Dave.


Yes, I understand. I was thinking more of an offline utility for doing 
this but I see why that can't be done in growfs.

So I guess it doesn't really work. This is why I ask the experts. I'll 
keep experimenting because due to the requirements of needing to 
physically move disks around, being able to move the log back and forth 
from internal to external would be extremely helpful.

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Moving existing internal journal log to an external device (success?)
  2023-08-21 13:07     ` fk1xdcio
@ 2023-08-24 20:22       ` Eric Sandeen
       [not found]         ` <21DD2F1E-AAB5-48BC-8C9F-7A9A07F3F81C.1@smtp-inbound1.duck.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Sandeen @ 2023-08-24 20:22 UTC (permalink / raw)
  To: fk1xdcio, Dave Chinner; +Cc: linux-xfs@vger.kernel.org

On 8/21/23 8:07 AM, fk1xdcio@duck.com wrote:
>>> *IF* this works, why can't xfs_growfs do it?
>>
>> "Doctor, I can perform an amputation with a tornique and a chainsaw,
>> why can't you do that?"
>> ,,,
>> -Dave.
> 
> 
> Yes, I understand. I was thinking more of an offline utility for doing
> this but I see why that can't be done in growfs.
> 
> So I guess it doesn't really work. This is why I ask the experts. I'll
> keep experimenting because due to the requirements of needing to
> physically move disks around, being able to move the log back and forth
> from internal to external would be extremely helpful.
> 
> Thanks!

Just out of curiosity, what is your use case? Why do you need/want to
move logs around?

-Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Moving existing internal journal log to an external device (success?)
       [not found]         ` <21DD2F1E-AAB5-48BC-8C9F-7A9A07F3F81C.1@smtp-inbound1.duck.com>
@ 2023-08-26 14:43           ` fk1xdcio
  0 siblings, 0 replies; 5+ messages in thread
From: fk1xdcio @ 2023-08-26 14:43 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs@vger.kernel.org

On 2023-08-24 16:22, Eric Sandeen wrote:
> On 8/21/23 8:07 AM, fk1xdcio@duck.com wrote:
>> Yes, I understand. I was thinking more of an offline utility for doing
>> this but I see why that can't be done in growfs.
>> 
>> So I guess it doesn't really work. This is why I ask the experts. I'll
>> keep experimenting because due to the requirements of needing to
>> physically move disks around, being able to move the log back and 
>> forth
>> from internal to external would be extremely helpful.
>> 
>> Thanks!
> 
> Just out of curiosity, what is your use case? Why do you need/want to
> move logs around?

Every so often I rotate certain drives from production servers to 
semi-offline servers for testing and verification. The production 
servers have SSD for cache and the fast external journal but the testing 
servers do not. The testing servers need to be able to test and verify 
the filesystem and do periodic synchronization/mirroring but obliviously 
can't use the original filesystem without the journal. Also some of 
these drives are moved offsite so being able to put the journal back to 
its internal position would simplify things.

Of course it would be possible to have extra drives in the testing 
environment that the logs could be moved to but the testing servers are 
very physically limited as to what can be hooked up to them so there 
really isn't enough room or ports for those extra drives. Plus the whole 
offsite thing.

It's more of a "want to help make life easier" than a hard requirement.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-26 14:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-20 19:37 Moving existing internal journal log to an external device (success?) fk1xdcio
2023-08-20 22:14 ` Dave Chinner
     [not found]   ` <B4C72D86-4CD6-415D-802E-7A225C868E57.1@smtp-inbound1.duck.com>
2023-08-21 13:07     ` fk1xdcio
2023-08-24 20:22       ` Eric Sandeen
     [not found]         ` <21DD2F1E-AAB5-48BC-8C9F-7A9A07F3F81C.1@smtp-inbound1.duck.com>
2023-08-26 14:43           ` fk1xdcio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox