Early Blockstack VM design notes

jude · February 14, 2018, 12:32am

@rocky from our Slack channel shared this a few days back. I felt it was worth sharing here

Writing Reverse Polish scripts sort of sucks, but to start off with it's what Bitcoin script does, so it isn't that intolerable by folks who do Bitcoin scripting.
The Etherium scripting language, Solidity, is much nicer and it would not be too hard to do something like that which compiles to the Bitcoin script VM. But this is separable.
Coq is cool, and seems like a good thing to prove an implementation has desirable characteristics, but that too is separable.
And incremental.
There is no prototyping benefit for proving the implementation correct and it bogs things down especially when there isn't a clear spec for either the VM or the language for that.
I think if that by verifying that the each VM opcode is safe, and that the interpreter loop safe and bounded. Then that shows that any sequence of valid VM opcodes is safe.
We might also have to prove some characteristics for parsing to VM, e.g. that parsing doesn't create an infinite loop. But would be part of a phase only after the VM works (minimally), and after the parser works(minimally) which is even after that. And again this is separable from everything else.
Computing max time and max memory limits fis pretty easy, after each opcode has time amd memory requirements associated with it.
The AST for the program is a tree, and one can compute requirements with a simple traversal of the tree.
This too is separable and can be done later.
Ok so now down to the meat of what needs to be done first.
First step that would be useful is nailing down additional VM instructions. It looks like there is free opcode space in the bitcoin VM spec. Generally I prefer to build on existing work rather than reinventing from scratch, except when there is good reason to redo.
If I have it correct, additional opcodes to Bitcoin script would be:


opcode      inputs                        output stacked
---------------- -----------------------------------------------   ---------------
OP_NAME_REGISTER: <name> <value> <pay_account_id> <zonefile_hash>   ? boolean
OP_NAME_UPDATE:  <name> <zonefile hash>                ? boolean

OP_NAME_TRANSFER: <name> <account_id>                 ? boolean
OP_ACCT_SEND:   <value> <account_id>                 ? boolean
OP_NAME_METADATA: <name> <metadata>                  ?
OP_STORE_DATA:  <data>                        ?
OP_MINE_BLOCK:  <hash> <account_id> <coinbase value> <scratch area> ?
OP_FIELD_ZONEFILE_HASH: <id>                      hash
OP_LOADNAME    <hashed id>                     ??
OP_FIELD_METADATA: <data>                       field metadata
OP_HASH_LSTACK  ?                          ?

For those above that are correct, it would be great to describe the inputs, outputs and what each opcode does with (when possible) links to other blockstack documentation. Something akin to https://en.bitcoin.it/wiki/Script#Crypto

Keep in mind that the behaviour might include an immediate failure with no stack output instead of a boolean unconditionally.

How does one capture time and resources for something like OP_MINE_BLOCK? Is the data in OP_STORE_DATA unlimited? If so one will have an unbounded limit for that. So a rough (with emphasis on "rough") idea of the time and memory resources for these operations would be nice.
For the various hashing code, we'll also have to fill something like that in. Eventually.
Also, it would be good to prioritize opcodes in those that would have the most benefit for a prototype. If there are just one or two opcodes that will show a lot off that'd be great.
I started modifying The Bitcoin IDE to add an opcode. (I added OP_HASH0 which the identity hash and is useful in testing scripts) As an IDE for testing scripts it is fine, but one needs to consult the Bitcoin Script reference implementation as well, since I found the semantics can differ. OP_EQUALVERIFY in the IDE is identical to OP_EQUAL which is incorrect as per the reference implementation since the former should raise an invalid transaction and terminate while the latter doesn't'.
But for the real thing, one wouldn't use this. Personally, initially I'd probably prototype in Python as that has to interface easily in a number of ways with
blockstack which is also in Python. But at some point I'd switch to a compiled language (Rust and Go are the obvious choices), not just for speed and low runtime overhead and to allow execution in limited environments.

jude · February 21, 2018, 5:22pm

Hi @rocky, thank you for your detailed write-up and programming language proposal!

I think it is worth pointing out that your post describes two computing paradigms:

Off-chain programs: This is what you described with your code example. An off-chain program’s job is to generate and broadcast transactions that encode instructions to be executed by Blockstack’s VM. The act of creating and populating a namespace can be achieved programmatically this way, which you pointed out. I use the term “off-chain” here to allude to the fact that the program instructions execute client-side–they are executed at a user’s behest, and the code is private (i.e. hosted only on the client). Off-chain programs generate Blockstack VM instructions and broadcast them to the blockchain. The Blockstack nodes are not aware of off-chain programs; they only see the transactions they generate.
On-chain programs: These are also known as “smart contracts”. Blockstack does not have any support for on-chain programs at this time, but the STACKs blockchain will at some point support them. As the name implies, an on-chain program lives within the blockchain. Its code is public and stored in every Blockstack node, and it can be programmatically invoked by a transaction sent from any client. An on-chain program is more like a “stored procedure” within the Blockstack VM—creating an on-chain program is like adding a new operation to blockstack.lib.operations, and running the on-chain program is like invoking one of these operations. Blockstack does not implement a way for clients to do this (i.e. adding new operations requires a hard fork), but the STACKs blockchain will support this feature natively.

Before going on, let me be clear: the following description of STACKs is an early design sketch and does not in any way indicate the final system design.

To give an example of an on-chain program that STACKs might one day support, one idea that’s been rattling around in my head is to give a namespace creator the ability to define a “name acceptance” program that defines which name operations will be allowed for names in the namespace. Before anyone asks, this can be done without a Turing-complete language (something more like Ivy).

Here’s how one might implement Cryptokitties using a hypothetical name-acceptance program. Let’s assume that the namespace .cryptokitties exists, and each name in this namespace represents a cat. Then, we’d want the name-acceptance program to only allow a name to be registered if all of the following are true:

The name is not already taken (i.e. no “cat clones” allowed)
The name’s NAME_REGISTRATION transaction has an associated value_hash (see the wire format). The value_hash would encode the cat’s genes.
No NAME_UPDATE transactions will be accepted (i.e. cats can’t change their genes)
The value_hash of a newly-registered name the hash of two previous names’ value_hash (i.e. each cat’s genes is a mixture of two parent cats). Call these the “parent names”
The “parent” names have to each have a certain age, and cannot have been used recently.

In STACKs, it should be possible for the .cryptokitties namespace creator to specify these constraints as an on-chain program when they create the namespace. STACKs would enforce these constraints by evaluating this on-chain program on each name operation on a .cryptokitties name. If all constraints are met, then the operation is accepted and processed. If not, then the operation is ignored.

What programming language facilities would an on-chain program need to accomplish something like this? A few come to mind:

Read fields for a name from the name database, such as value_hash
Read fields from a name’s history in the name database, such as the block height of the last transaction that affected a name.
Read fields of the transaction under consideration (i.e. “if transaction.op == NAME_UPDATE, then REJECT”)
Perform cryptographic operations on the transaction being considered (i.e. if transaction.value_hash != hash(parent1.value_hash, parent2.value_hash), then REJECT)

While Blockstack’s Python library gives a way to implement off-chain programs (albeit not in a particularly domain-specific way), there are currently no facilities to implement on-chain programs like the above. However, it is something we’d like to add in order to allow developers to extend the Blockstack VM’s consensus rules in application-specific ways.

ethanscott · June 8, 2018, 9:42am

Hello all,
I know this post is a little old but I recently started checking out this solidity tutorial and thought I’ll give it a whirl.
After looking at this post, I am skeptical as to whether I should be continuing with the course because all the issues you guys are facing.
If possible, could you guys also suggest an alternate contract-oriented programming language.
Also, how often do you guys face these issues?