Categories
Lightning

Drive failure on LND node or…
How to lose some Bitcoin

Difficulty: Low

Tonight, on a very special episode of BitcoinLizard Blog…

I created this blog with the intention of mostly writing about the setup and implementation of various Bitcoin related software projects. In particular I was planning on writing about moving away from what I considered to be an unreliable Lightning node implementation. That implementation being a Raspberry Pi with the LND node data stored on the SD card. My intention was that I’d shutter my BitcoinLizard Tor only routing node that was running on a Raspberry Pi and build out a new routing node running Core Lightning on a more robust server with redundant storage. This post is about how I waited too long to decommission this LND node and as a result faced serious consequences. This post is a warning to anyone whom runs a Lightning node and doesn’t take the proper precautions with regard to backing up their node data.

I knew running off an SD card on a Raspberry Pi was a bad idea. SD cards are known to fail at a higher rate than more traditional storage. I’d built up a pretty decent collection of channels and a lot of inbound liquidity but I knew it was time to shut down this node. I could have easily migrated this LND configuration to more robust equipment but for reasons I want to move to Core Lightning (formerly C-Lightning). As a result I was planning on decommissioning this LND node and building a new Core Lightning node on new hardware. I was still making some sats off of routing fees with my Raspberry Pi node and I wasn’t opening any new channels so my cost to continue operation was minimal. As a result I let my peers close channels over time and set this lnd.conf option high enough that no potential peers would be able to open a new channel to me:

minchansize=100000000

I could just let things wind down on their own. After all, I had my seed backed up and I used this method to make sure I always had a current copy of channel.backup stored on my NAS:

https://gist.github.com/alexbosworth/2c5e185aedbdac45a03655b709e255a3

Here is an important side note to node operators. Backing up to a NAS is critical. It is a must to store the channel.backup somewhere other than the SD card or other local storage. If the SD card fails and you don’t have a current copy of channel.backup you may not be able to recover your Lightning channel funds even if you have a valid seed for your LND node. You will certainly be depending on your channel peers to help you with the recovery process. They will have to manually force close the channel for you to recover the funds on your side of the channel which they will do at a time that they deem it necessary. You will have no control on when they will force close you if they ever do it at all!

Now on to my story. One morning I logged into the Ride The Lightning page that I hosted on my LND node. I found that I couldn’t read my node status; the statistics about my node failed to load. I logged into my LND node via SSH and found that things didn’t seem to be working very well. All of my commands were taking a very long time to execute. I suspected SD card issues right away. I attempted to take a backup of the .lnd directory as that would have everything I would need to spin this LND node back up on other working hardware and keep my channels intact. I found that only the files in the .lnd directory and none of the directories within .lnd would copy over to my NFS file share. This was problematic as without the channel data I wouldn’t be able to return this node to a working state. After hours of trying unsuccessfully to recovery the .lnd directories from the live system, I reluctantly rebooted the Raspberry Pi and hoped that the issue was with the Pi itself and not the SD card. Of course I was able to quickly determine that the issue was with the SD card. I found that the Pi would no longer boot. I then found the SD card to be quite unreadable by any method I attempted on another system. It was time to switch to recovery mode. My SD card was completely unrecoverable by any method I had at my disposal.

Like I said, I had my seed backed up and a current copy of channel.backup so I was sure I’d be able to recover most or all of the funds. However, I was starting to feel a bit uneasy as I had several LND seed backups from different projects I had worked on over the last couple years and hadn’t actually validated that the seed backups I had corresponded to this particular node that contained the vast majority of my Lightning liquidity. I regretted not just running this command a while ago to close all of my channels and easily move my funds on chain:

$ lncli closeallchannels

The recovery process doesn’t allow you to return the node to a working state. All of the channels will be closed and coins will be sent to the wallet derived from the LND Aezeed seed. As a quick side note, only LND uses Aezeed seeds. You can’t recover coins from an LND node with another wallet such as Electrum. The funds can only be recovered with a functional LND node.

I knew where I had screenshots of several of the special LND Aezeed seeds stored so I began to read about the recovery process. One important thing to note; taking screenshots of seed phrases is never recommended. Seeds are to be written down and stored offline. This was was one of several mistakes I made out of rushing/being lazy when setting up this node.

Another even more important note and I can’t stress how important this one is. As far as I know, there is no way to ever display the 24 word Aezeed seed after it is displayed during LND setup. This is a one shot deal. You must record the seed during initial node setup. Without this seed you can’t recover any of your funds on chain or in Lightning channels. If you aren’t able to confirm that you have this seed for your current running LND node you should probably decommission your node and rebuild from scratch with a properly backed up seed.

I searched through my seed backups and found the one that I thought corresponded to my Tor only LND node. I spun up a new LND instance on my desktop computer. It is important to note that to recover the off chain (Lightning channel) funds your recovery machine must support the same communication protocols, IPv4, IPv6, or Tor. This is discussed in this blog post by Jameson Lopp which also includes a lot of other great information about recovering from a failed disk:

https://blog.keys.casa/lightning-wallet-recovery-lessons-learned/

I had a rough idea how many coins I had in the on chain wallet and in Lightning channels. I fired up the node following the excellent official documentation provided by Lightning Labs:

https://github.com/lightningnetwork/lnd/blob/master/docs/recovery.md

Now at this point I wasn’t trying to recover the coins in the Lightning channels, only the on chain funds. I started the LND instance up in recovery mode. LND started scanning the blockchain for used addresses. This blockchain scan takes hours to complete, in my case probably 6-8 hours. During the course of the scan the LND logs use the phrase “Recovered addresses from blocks…”. This sounded promising! It was quite late at night so I determined I had done all I could for now and went to bed.

I woke up early the next more and checked on the recovery process. The blockchain scan had completed so I ran the following command:

$ lncli walletbalance

As you have probably guessed my balance was zero. This was not good. I was over come with a sinking feeling that in my haste to setup the node I didn’t take the time to backup the seed. This didn’t seem like something that I would do. Placing the seed in a location that I wouldn’t recall when it was time to recover my node however seemed like something I would certainly do!

The next several days consisted of me looking through all the files and directories of any of about 4 computers that I have used over the past several years and looking at any pieces of paper where I might have written down this seed. I found other seeds and put them through the recovery process and each time my wallet balance was zero when the scan completed. I was beginning to lose hope. This loss wouldn’t ruin me but the loss would greatly impact my ability to run a Lightning node in the future that had more than 1 or 2 channels.

On the 3rd or 4th day of spending a considerable amount of hours searching for the seed I found a seed keyed into a text file. This seemed very promising based on the creation date timestamp of the text file so I scrambled to start the recovery process. When the process completed I was very pleased to find that the on chain wallet had roughly the amount of coins I was expecting. What a relief! It appeared as though the Bitcoin Gods were going to take mercy on me for my shoddy seed backup practices.

Fortunately I was much more diligent about keeping a current copy of the static channel backup file “channel.backup”. I launched the Lightning channel recovery process:

$ lncli restorechanbackup –multi_file=channel.backup

I could see in the LND logs that my channels were being closed. About an hour or so later my Lightning channel funds were showing up in my on chain balance! The recovery was working. This command showed me that some of my channels were still open:

$ lncli pendingchannels

The recovery process is interesting. It is detailed in this previously mentioned document provided by Lightning Labs:

https://github.com/lightningnetwork/lnd/blob/master/docs/recovery.md#off-chain-recovery

The LND node attempting the recovery attempts to contact the Lightning channel peers to initiate the data loss protection protocol based on the channel information contained in the channel.backup file. If your channel.backup file isn’t current (this file updates any time a channel is opened or closed), the LND node performing the recovery won’t know about all of your channels and those funds will remain stuck in those channels. As mentioned above, your peer may force close the channel when they notice that you have been offline for a long time. There is however no guarantee that they will do this as this is a process that requires manual human intervention. If your peer doesn’t closely monitor their node they may never bother to force close the channel. Even worse, if their node has also failed those funds will remain locked in the channel forever with neither side being able to initiate the recovery process.

Upon the recovery node communicating with the working peer, the working peer will close the channel. It is unclear to me if this is a trustless process or if a malicious peer could attempt to steal the funds when they learn that the distressed node is attempting recovery. In my case I’m pretty sure that I recovered all funds that were on my side of the channel for all channels.

UPDATE – 2022/08/13:
Indeed this recovery process is not trustless. The peer must support “option_dataloss_protect”. Modern Lightning implementations should support this option however if the peer doesn’t support it you are likely to lose all funds in that channel. Additionally a nefarious peer can claim to support “option_dataloss_protect” but publish an outdated channel state when the peer learns that your node is distressed.

After the channel recovery process has been initiated all you can do is wait. The recovery node must communicate with the peer to initiate the channel closing. If the recovery node can’t communicate with the peer for any reason the channel will remain open with the funds locked inside. If the functional peer is offline the funds will remain trapped until that peer comes back online and communicates with the recovery node. As touched on previously, if the peer node has also had a catastrophic failure the funds will remain trapped in the channel forever!

I’m somewhat of a lightning node snob. I tried to only have peers that were online all of the time. This turned out to be a fortuitous policy when my node failed. My node was able to get in touch with all of my peers and get the channels closed. There is a lesson in Lightning node hygiene here. Make sure to force close channels to peers that have been offline for a long period of time. It is possible that the peer node has failed and when you force close you will be doing them a favor as they will receive their portion of the channel funds in their on chain wallet (if they have properly backed up their seed). From your perspective, if you only have peers that are mostly online you will have success with the recovery process should your LND node fail.

An important thing to note. Time is of the essence. Your peers may have been online yesterday but today they might have experienced an unrecoverable failure with their node. If the LND recovery node can’t reach an active working node, those funds will remain locked in the channel forever! There sure are a lot of ways that the funds can stay locked in a channel forever!

As mentioned above you can run this command to see how many channels haven’t been closed yet:

$ lncli pendingchannels

If you are lucky, this command will eventually show that you have no pending channels. That means you have recovered the maximum amount of funds possible and that all of your channels have been successfully closed. You can send the funds from the on chain LND wallet to another wallet and shut down this recovery instance of LND.

As with any other “very special episode” there must be a moral to the story. The moral for the LND node operator is this: make sure you create and properly store a paper copy of your Aezeed seed and that you have a method in place to reliably backup the channel.backup file to external storage every time you have a channel open or close. The LND recovery process is very well documented and it works. If you check off these two boxes you will almost certainly recover the vast majority of your Bitcoin.

3 replies on “Drive failure on LND node or…
How to lose some Bitcoin

Glad you got your sats back. Seems like there should definitely be a way to automatically close channels after they become inactive for a certain period. This should be build into LND.

Leave a Reply

Your email address will not be published.