Improving communications for the Stacks network

jwiley · October 2, 2022, 7:29pm

Stacks Community -

Over the past week we saw transaction volume climb up, likely driven in part by a surge in BNS name transactions. Within a week, average mempool size increased by 5x (by Friday morning). Around Friday 10am ET, engineers from the Stacks Foundation and other entities started mobilizing to address the issue. A PR was opened by 3pm ET, and a new release was published by 8pm ET. It’s worth noting that this was the fastest release turnaround time ever – there is a pretty comprehensive release checklist in place (for a good reason), but given the scoped fix and the expected impact, the developers chose to expedite the release process in this instance.

Also worth pointing out is that this quick fix was made possible because of months of performance analysis and benchmarking, some in the context of subnets (e.g. this, this, this). So in hindsight, what seems like an obvious or quick fix is often enabled by a lot of hard-work in the background.

The latest release only targeted the lowest-risk change that would have a big impact, but there’s still a lot of room for improvement (e.g. this). And as always, ultimately miners have to actually upgrade for the network to benefit from the fixes – we do not conclusively know how many miners have upgraded yet.

Beyond the technical issue though, we also heard from many members of the Stacks community a clear and pressing need for better communication and information sharing when there are issues on the network. It can be frustrating if something is slow or not working - and just knowing what is going on, who is working on it, and how help can be contributed goes a long way.

Communications Upgrades

To help keep the Stacks community working better through technical issues such as this, we’re planning a few upgrades.

First, the Stacks Foundation has created @StacksStatus Twitter handle as a single resource to provide timely updates on the status of the Stacks layer and point people to the most up to date information. Several other entities have offered to help collect & feed relevant status information for this Twitter handle.

Second, there are plans for a dedicated status page for the Stacks layer similar to the status page that Hiro System maintains for the API and other infrastructure it runs. Any entities or individuals who want to help with the status page effort should get in touch with us.

Depending on the type of discussion, the main interactions around technical issues will tend to happen on either stacks-core-devs on Discord, stacks-blockchain repo on Github, or potentially the Stacks Forum - no matter what though, you’ll be able to check this Twitter account and get pointed to wherever the discussion of the blockchain is happening.

Third, over the next 1-2 weeks we’re going to draft a set of incident response guidelines that will be publicly accessible and will lay out how whomever is working on the response to a technical incident should communicate, including when to make updates, what they should contain, and where to host live discussions. Your input on this as a community will be critical as we want to make sure this information is accessible to everyone and folks can contribute to the efforts.

Wrapping Up

Hopefully this post has given you the relevant information about the issues on Friday as well as a path forward for how we as a community will be improving our response to technical incidents going forward. The feedback from the community is loud and clear: everyone wants more visibility, timely updates, and is willing to help.

If you didn’t already, please follow @StacksStatus on Twitter and keep an eye out there and in Discord for the links to forum posts discussing our other follow up items.