`xfs_repair -L`したらデータが吹き飛んだ

数ヶ月前から外付けHDDの調子が悪いと思ったら、不思議な事象に出喰わした。マウントはされているのだけど、そのマウントポイント以下のディレクトリやファイルを参照しようとするとエラーとなる。ls: '/mnt/hdd/other/files' にアクセスできません: 入力/出力エラーです

なお、以下の内容は、一人の馬鹿者の愚行を記録しただけなので真似たり、そのまま信じないように。

最初にfsckを使ってフィイルシステムの修復を試みるも、If you wish to check the consistency of an XFS filesystem or repair a damaged filesystem, see xfs_repair(8).と表示される。

外付けHDDのファイルシステムは全て XFSにしているので、xfs_repairを使ってみることにした。

xfs_repairする前に対象となるデバイスのマウント解除しなければならない。しかし、普通にマウント解除しようとするとumount: /mnt/hdd: target is busy.とエラー。

fuser -muv /mnt/hddで、デバイスを使用中のプロセスを検索した。そして、systemctl --user stop docker.serviceとか掴んでるアプリ、サービスなど止めた。

それから、-nを付けて、sudo xfs_repair -n /dev/sda1で修復を行わず修復内容を確認しようとした。修復内容の確認が思ったより時間がかかったので、途中で止めたのが、今思えば良くなかったかも。この時点でぶっこわれて、パーティション情報とかも消し飛んでいた可能性あり。

この後あたりから、-n付けずにやろうとするも、リソースを掴んでいるプロセスがないのに、以下のエラーとなってしまう。

xfs_repair: /dev/sda1 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library

試しに、パーティション番号を付けるのを止めたらxfs_repairが成功するようになった。なので、-nを付けず実行して、修復を試みたのだけど、次にエラーメッセージが表示された。

$ sudo xfs_repair dev/sda
Phase 1 - find and verify superblock...
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap inode pointer to 97
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary inode pointer to 98
Phase 2 - using internal log
        - zero log...
* ERROR: mismatched uuid in log
*            SB : e110a464-551d-4252-ba0b-475ec6707064
*            log: 9facfc94-f10e-4567-aed4-9abba643fabe
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

メタデータの変更をログに残している¹が、そのログを再生するためにマウントする必要があるらしい。マウント後、解除してxfs_repairを再実行しろと。で-Lを付けて実行した場合は、ログを破棄して修復を試みる、と。

確かにちゃんと 15.5. xfs_repair で XFS ファイルシステムの修復 | Red Hat Product Documentationを読むと、xfs_repairの手順は以下のとおりとなる。

xfs_metadumpユーティリティーを使用して、診断またはテストの目的で、修復する前にメタデータイメージを作成

# xfs_metadump block-device metadump-file
ファイルシステムを再マウントしてログを再生

# mount file-system
# umount file-system
xfs_repairユーティリティーを使用して、ファイルシステムを修復

# xfs_repair block-device

sudo mount /dev/sda /mnt/hddとマウントをするも、無情にもmount: /mnt/hdd: mount(2) システムコールが失敗しました: 構造体を内容消去する必要があります.と表示される。

上の手順でも

このコマンドを実行すると、クラッシュ時に進行中だったすべてのメタデータの更新が失われます。これにより、ファイルシステムに重大な損傷やデータ損失が生じる可能性があります。これは、ログを再生できない場合に最後の手段としてのみ使用してください。

とあるが、まあデータの中身が、少し前に戻るぐらいだと高を括っていたが…

$ sudo xfs_repair -L /dev/sda
Phase 1 - find and verify superblock...
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap inode pointer to 97
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary inode pointer to 98
Phase 2 - using internal log
        - zero log...
* ERROR: mismatched uuid in log
*            SB : e110a464-551d-4252-ba0b-475ec6707064
*            log: 9facfc94-f10e-4567-aed4-9abba643fabe
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x55b37b4e6d73, xfs_agf block 0x1/0x200
agf has bad CRC for ag 0
bad magic number
Metadata CRC error detected at 0x55b37b4e6d73, xfs_agf block 0x15d508f11/0x200
agf has bad CRC for ag 3
Metadata CRC error detected at 0x55b37b5053a3, xfs_agi block 0x15d508f12/0x200
agi has bad CRC for ag 3
bad on-disk superblock 3 - bad magic number
primary/secondary superblock 3 conflict - AG superblock geometry info conflicts with filesystem geometry
zeroing unused portion of secondary superblock (AG #3)
bad magic # 0x675e35f1 for agf 3
bad version # 808892813 for agf 3
bad sequence # 370696256 for agf 3
bad length 671380572 for agf 3, should be 244188660
flfirst -1069427418 in agf 3 too large (max = 118)
fllast 455373726 in agf 3 too large (max = 118)
bad uuid f0a0965f-b6ea-f246-57aa-4a1850080b1d for agf 3
bad magic # 0xc064208f for agi 3
bad version # 153315667 for agi 3
bad sequence # -2022498302 for agi 3
bad length # 1331997695 for agi 3, should be 244188660
bad uuid c702f114-d508-8618-a2b5-c6af70ca20c1 for agi 3
reset bad sb for ag 3
reset bad agf for ag 3
reset bad agi for ag 3
Metadata CRC error detected at 0x55b37b5053a3, xfs_agi block 0x2/0x200
agi has bad CRC for ag 0
clearing needsrepair flag and regenerating metadata
bad magic # 0x45464920 for agf 0
bad version # 1346458196 for agf 0
bad sequence # 256 for agf 0
bad length 1543503872 for agf 0, should be 244188662
flfirst 570425344 in agf 0 too large (max = 118)
bad uuid 92801328-8776-3304-0200-000000000000 for agf 0
bad magic # 0xaf3dc60f for agi 0
bad version # -2088471993 for agi 0
bad sequence # -1904657047 for agi 0
bad length # -666403356 for agi 0, should be 244188662
bad uuid 00000000-0000-0000-0000-000000000000 for agi 0
reset bad agf for ag 0
reset bad agi for ag 0
Metadata CRC error detected at 0x55b37b4e6c40, xfs_agfl block 0x15d508f13/0x200
agfl has bad CRC for ag 3
bad agbno 3869088721 in agfl, agno 3
freeblk count 1 != flcount 2003153573 in ag 3
bad agbno 1345634306 for btbno root, agno 3
bad agbno 506791017 for btbcnt root, agno 3
agf_freeblks 3264567836, counted 0 in ag 3
agf_longest 805948012, counted 0 in ag 3
agf_btreeblks 3382142569, counted 0 in ag 3
bad agbno 1771919373 for inobt root, agno 3
Metadata CRC error detected at 0x55b37b4e6c40, xfs_agfl block 0x3/0x200
agfl has bad CRC for ag 0
bad agbno 0 in agfl, agno 0
freeblk count 1 != flcount -1900101423 in ag 0
bad agbno 2840599565 for btbno root, agno 0
bad agbno 0 for btbcnt root, agno 0
agf_freeblks 16777216, counted 0 in ag 0
agf_longest 3867226431, counted 0 in ag 0
agf_btreeblks 218958159, counted 0 in ag 0
bad agbno 3340088390 for inobt root, agno 0
bad agbno 0 for finobt root, agno 0
agi_count 1176925517, counted 0 in ag 0
agi_freecount 1511860775, counted 0 in ag 0
agi_freecount 1511860775, counted 0 in ag 0 finobt
agi unlinked bucket 0 is 4290232529 in ag 0 (inode=4290232529)
agi unlinked bucket 1 is 16777216 in ag 0 (inode=16777216)
agi unlinked bucket 2 is 0 in ag 0 (inode=0)
agi unlinked bucket 3 is 0 in ag 0 (inode=0)
agi unlinked bucket 4 is 838873088 in ag 0 (inode=838873088)
agi unlinked bucket 5 is 822098176 in ag 0 (inode=822098176)
agi unlinked bucket 6 is 822096128 in ag 0 (inode=822096128)
agi unlinked bucket 7 is 805319680 in ag 0 (inode=805319680)
agi unlinked bucket 8 is 1593848832 in ag 0 (inode=1593848832)
agi unlinked bucket 9 is 1409303040 in ag 0 (inode=1409303040)
agi unlinked bucket 10 is 0 in ag 0 (inode=0)
agi unlinked bucket 11 is 0 in ag 0 (inode=0)
agi unlinked bucket 12 is 0 in ag 0 (inode=0)
・・・中略・・・
sb_fdblocks 976277696, counted 487900386
root inode chunk not found
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1940879144:689675635) is ahead of log (1:2).
Format log to cycle 1940879147.
done

と終了したので、中身を確認するとデータが吹き飛んでいた…しょうがないので、一度mkfs.xfsでフォーマットしなおした。操作ミスもあるが、そもそもHDDの寿命かも。他のハードディスクにバックアップしないとな。

ログ構造ファイルシステムとは - わかりやすく解説 Weblio辞書 ↩︎

有馬総一郎のブログ

(彼氏の事情)

`xfs_repair -L`したらデータが吹き飛んだ