Scroll down
Close -

Incident report: RSK peg-out service outage

Prepared by the RSK Core Development Team.

UPDATE Nov 16th, 2022

  • As was previously announced, in preparation for a future Mainnet patch release, we are today making a Testnet-only patch release available to the general public. This new release contains consensus changes; as a consequence, all nodes running on RSK Testnet need to be upgraded. Full details in this blog post.

UPDATE Nov 9th, 2022

  • We have successfully concluded the tests for the patch, and we’re preparing the release to be shipped soon. 
  • Due to its criticality, the release will happen in two stages: we will release a Testnet-only patch first (expected to occur on the week of Nov 14th). After validating that everything is working as expected on Testnet, we will release a Mainnet patch release (no defined dates yet).
  • The release candidate version code is available at the following link: https://github.com/rsksmart/rskj/tree/new-erp-script. Network upgrade activation blocks have not been defined yet.
  • The Emergency Network Upgrade RSKIP is still open for comments here: https://github.com/rsksmart/RSKIPs/blob/master/IPs/RSKIP358.md.

UPDATE Oct 28th, 2022

  • The Emergency Network Upgrade RSKIP is now published and open for comments here: https://github.com/rsksmart/RSKIPs/blob/master/IPs/RSKIP358.md. This RSKIP includes links to the proposed consensus changes to be applied in this network upgrade, with the goal of fixing the issue described in this incident report and resuming the RSK peg-out service.

UPDATE Oct 26th, 2022

UPDATE Oct 24th, 2022

UPDATE Oct 18th, 2022

 

The following is the incident report for the RSK peg-out service outage identified on October 4th, 2022. The service has not been resumed at the moment of writing this report. We understand this service issue has impacted the RSK platform users, and we apologize to everyone affected.

Incident Summary

RSK Mainnet Network is experiencing an outage in its peg-out service, preventing users from converting RBTC into BTC through the RSK PowPeg. The root cause of this service disruption is that the peg-out Bitcoin transactions are being considered non-standard by the latest Bitcoin Core client, and as a consequence, most Bitcoin nodes will not include them in their mempool. Consequently, these transactions are not being confirmed on the Bitcoin network.

Root Cause Analysis

On RSK Mainnet block #4,675,281, the network created the first Bitcoin UTXOs containing the PowPeg Time-locked Emergency Multisig (ERP). As described in RSKIP-201, ERP is a PowPeg fallback system introduced in the RSK Iris Network upgrade to be used in case the majority of PowHSMs (Powpeg Hardware Security Modules) fail simultaneously. 

The ERP proposal required a change to the peg-out Bitcoin transactions’ scripts, making them longer and more complex. We detected that the RSK P2SH redeem script is mischaracterized as “non-standard” by the Bitcoin Core nodes with the default configuration (technical details about the RSK peg-out transaction script and Bitcoin validations are shared in the next section). Although the transactions are valid from a Bitcoin consensus point of view, default-configured Bitcoin nodes will not include them in their transaction mempool or broadcast them to the rest of the network. As a result, transactions are not being confirmed on the Bitcoin network, even when the pegout process on the RSK network side is completed. This is very well described in this article by James Prestwich:

“Non-standard transactions will not be allowed in the mempool of default-configured nodes, unless they’re included in a block first. As a result, these transactions are not broadcasted or relayed through the network. That said, if a miner sees that transaction, she is free to include it in a block (if it passes all validity checks). All nodes will accept the non-standard transaction once it’s in a block.”

In our understanding, mischaracterizing the RSK P2SH redeem script as non-standard is a bug in the Bitcoin Core client, as similar scripts used within a Segwit P2WSH address are fully allowed. We’ll report this, together with a proposed fix, to Bitcoin developers for careful examination. At the same time, as we need to unblock the peg-out service and provide a solution to the RSK users as soon as possible, we are already working on a solution based on the modification to the redeem script. In case fixing Bitcoin Core is not a quick option, we will submit an emergency network upgrade proposal to the community. 

The reason why this bug went undetected until the RSKIP was activated in RSK Mainnet relates to our testing procedures. The ERP feature was thoroughly reviewed and later tested on Bitcoin Testnet, where everything worked as expected. However, the default configuration for Bitcoin Core nodes in Testnet is more relaxed in terms of validations than Mainnet. Consequently, we failed to detect this bug before ERP was released on Mainnet. This has been an important learning lesson for the team; as a result, we have corrected our testing procedures to prevent this class of bugs from ever happening.

Technical Details about the Transaction Script Validation

To characterize a transaction as standard or non-standard, Bitcoin Core tries to estimate the number of signature checks a particular script can potentially request. A maximum of 15 signature checks is allowed. For each CHECKSIG or CHECKSIGVERIFY found in a script, one is added to the signature check counter. For multisigs, the maximum number of checks depends on the number of signatories. When CHECKMULTISIG is executed, this number is at the top of the stack. However, to avoid executing the script to obtain the top of the stack value, Bitcoin Core performs an estimation based on static analysis of the script in the function GetSigOpCount(). However, this analysis is very rudimentary, and it can fail. 

The code assumes that the number of signers is pushed into the stack immediately before the checkmultisig opcode. If the previous opcode is not a push opcode, then Bitcoin Core assumes the worst case for each CHECKMULTISIG found, which is 20 signatures (defined by MAX_PUBKEYS_PER_MULTISIG). The problem is that this constant is greater than the maximum number of checks allowed per P2SH redeem script, which means that any P2SH script that uses CHECKMULTISIG must specify the number of signers with a PUSH opcode right before.   

This is RSK’s current peg-out transaction script:

OP_NOTIF 

    OP_PUSHNUM_M

    OP_PUSHBYTES_33 pubkey1

    ...

    OP_PUSHBYTES_33 pubkeyN

    OP_PUSHNUM_N 

OP_ELSE 

    OP_PUSHBYTES_3 <time-lock-value>

    OP_CSV 

    OP_DROP 

    OP_PUSHNUM_3 

    OP_PUSHBYTES_33 emergencyPubkey1



    OP_PUSHBYTES_33 emergencyPubkey4

    OP_PUSHNUM_4 

OP_ENDIF 

OP_CHECKMULTISIG

We see that there are two script paths and, to reduce the script size, a single CHECKMULTISIG is used for the two paths, separating the signer count from the CHECKMULTISIG opcode. As a reference, the Liquid sidechain uses a very similar script.

We can see here that GetSigOpCount() will find that END_IF is not a push opcode and will assume the number of signature checks is MAX_PUBKEYS_PER_MULTISIG.

You can find references to the Bitcoin Core line codes involved in the standard input check here:

Actions

The RSK Core Development team is taking the following actions to address the underlying causes of the issue and to help prevent recurrence: 

  1. Making the necessary changes to the RSK client implementation code so that the Bitcoin transaction script is considered standard by the Bitcoin Core nodes. 
  2. Analyzing alternatives that could minimize the impact on users until the peg-out service is resumed.
  3. Submitting a bug report to Bitcoin Core and preparing a PR with a proposed fix for community discussion. 
  4. Reviewing and adjusting internal processes to ensure these sensitive changes are adequately tested on a Bitcoin Mainnet environment before release.

It’s important to emphasize that the most probable outcome for resolving this issue will be the proposal of an emergency network upgrade, which requires all node operators who adhere to these changes to upgrade to a new client version before the network upgrade is activated. At this moment, there is no estimated date of resolution yet. We will keep the RSK community updated about this.

We appreciate your patience until the issue is resolved and apologize for the impact caused to the users. 

The RSK Core Development Team