Flow Design

The underlying structure of storage is called Flow. Flow is a continuously appended list of sectors, with each sector being a 256-byte data segment. The data submitted by users will be converted into several consecutive sectors. Users need to upload the metadata of the submission to the storage contract, which will then allocate the starting position in the Flow for the submission. Afterward, users will send the actual data to the storage nodes in the 0g network.

(Note: Concepts like "file" and "segment" are abstracted at a higher level and are not considered within the scope of Flow.)

The Merkle tree over the Flow

A Merkle tree is constructed based on Flow, and the storage contract will maintain the root of the Merkle tree. Since users only submit metadata instead of the complete data to the contract, both the structure of the Merkle tree and the submission metadata are specially designed.

The Merkle root

The Merkle root over the Flow is defined as padding the flow with zeros until the number of sectors is a power of 2. Then, compute the keccak256 hash of each sector, and use keccak256 to calculate the Merkle root of these hashes. The diagram below illustrates an example of a flow with 5 sectors.

Submissions

Since users only upload metadata to the storage contract, certain constraints must be met to ensure that the contract can correctly maintain the Merkle root: The user must pad the original data with zero bytes until it meets the submission conditions.

A submission consists of several concatenated sector arrays that must adhere to the following:

  • Each sector array's length must be a power of 2, and the lengths of these sector arrays must decrease strictly monotonically.

  • The length of the longest sector array must not exceed 8 times the length of the shortest sector array.

The metadata for a submission includes:

  • The size of the original data in bytes

  • The Merkle root and length of each sector array

The hash of the submission is defined as:

  • The keccak256 hash of the concatenate for each sector array‘s Merkle root.

This approach ensures that:

  1. The user's submission is always composed of several Merkle trees over the subtrees within the Flow, providing the Merkle root.

  2. The user's submission will not be excessively long.

When receiving submissions, the contract will pad the Flow with zeros to ensure that the start position aligns with the length of the first sector array. For example, if the first sector array contains 64 elements, the submission should be aligned to a 64-element boundary.

Padding for a submission

Here we provide a concrete algorithm for padding the original data to be a valid submission

  • Let mm be the number of bytes for data to be submitted.

  • Let n=m/sector_size.n=\lceil m/\texttt{sector\_size}\rceil.

  • Let kk be the minimum integer such that n<2kn<2^k.

  • Let d=max(1,2k4)d=\max(1,2^{k-4}).

  • Binary decompose n/d\lceil n/d\rceil to (c1,c2,)(c_1,c_2,\cdots)

    • For example, 13 can be decomposed to (8, 4, 1)

  • Right pad data with 0 to make its length matches (c1+c2+)dsector_size(c_1+c_2+\cdots)\cdot d\cdot \texttt{sector\_size} bytes.

  • Divide data into several chunks with c1d,c2d,c_1\cdot d,c_2\cdot d,\cdots sectors respectively.

Proof of Random Access

PoRA (Proof of Random Access) is the mining algorithm for storage nodes. It periodically issues tasks requiring storage nodes to read the stored data and compute a hash value. When the hash value is below a specified threshold, the node finds a solution and submits the relevant content to the blockchain to claim a reward.

List of parameters

Constant parameters

ParameterRequirementCurrent value

SECTOR_SIZE

Power of 2

256

SEAL_SIZE

Power of 2, ≥ SECTOR_SIZE

4 * 2^10 (4 KB)

SCRATCHPAD_SIZE

Power of 2, ≥ SEAL_SIZE

64 * 2^10 (64 KB)

LOADING_SIZE

Power of 2, ≥ SCRATCHPAD_SIZE

256 * 2^10 (256 KB)

PRICING_SIZE

Divided by LOADING_SIZE

8 * 2^30 (8 GB)

MAX_MINING_SIZE

Divided by PRICING_SIZE

8 * 2^40 (8 TB)

EPOCH_INTERVAL

≥ 10 minutes

1200 blocks (about 1 hour)

Also we can derive the following parameters

ParameterDefinitionCurrent value

SECTORS_PER_SEAL

SEAL_SIZE/ SECTOR_SIZE

16

SECTORS_PER_PAD

SCRATCHPAD_SIZE/ SECTOR_SIZE

256

SECTORS_PER_LOAD

LOADING_SIZE/ SECTOR_SIZE

1024

SECTORS_PER_PRICE

PRICING_SIZE/ SECTOR_SIZE

32 * 1024 * 1024

SECTORS_PER_MINE

MAX_MINING_SIZE/ SECTOR_SIZE

32 * 1024 * 1024 * 1024

Admin adjustable parameters

Mine context

The process of PoRA is not working all the time. It is activated by L1 blocks periodically. For every EPOCH_INTERVAL blocks, a mine context will be released by the storage mine contract. All the miners will work on PoRA to find valid solutions. Once TARGET_SUBMITS solutions are submitted, the miners stop PoRA until the next trigger. This can avoid making the miners’ disk busy all the time.

A mining context will be made with the following fields:

  • epoch the index of the context

  • blockNumber the block number of the block triggering PoRA

  • blockDigest the hash value of the block triggering PoRA

  • flowRoot the Merkle root of the Flow

  • flowLength the number of sectors in the Flow

  • digest (referred as Context Digest) the keccak256 hash of (blockDigest, flowRoot, flowLength)

Registration

Miners must register in the smart contract to obtain a minerId for PoRA. The mine contract will generate a random ID for the miner’s on-chain address. Miners can change their miner ID at any time by requesting a new ID.

In the latest version of the storage node, the node will automatically check and request an ID upon startup.

In future versions, the rewards earned by the same miner ID may be capped.

Data sealing

To ensure that each miner ID independently stores a copy and to maintain the total number of copies in the storage network, PoRA introduces a data sealing mechanism. Unlike Filecoin, the sealing and unsealing processes do not involve heavy cryptographic computations. Instead, they are designed to ensure that the cost of sealing and unsealing is higher than the cost of storage.

The data is sealed at every SEAL_SIZE byte. The sealed data is computed by

  • the raw data unsealed in the Flow,

  • the minerId,

  • the index of the first sector in unsealed (should be in alignment of SEAL_SIZE / SECTOR_SIZE)

  • the digest contextDigestof the first context whose flowLength covers the whole unsealed.

unsealed is regarded as an array of 32-byte elements and is sealed by

<aside> 💡 By default, all integers are in 256-bit big-endian format when computing hash values.

</aside>

sealed[0] = unsealed[0] ^ keccak256(minerId, contextDigest, index)
sealed[i] = unsealed[i] ^ keccak256(sealed[i-1])

The sealing process is sequential and the unsealing process is parallel-able. A node can answer a part of the data without unsealing the whole chunk.

Calculate loading chunks

The general process of PoRA involves the following steps:

  1. Based on some on-chain seeds (such as a context digest) and a user-provided nonce, a recall position is computed through a CPU-intensive task.

  2. The miner mixes the loaded data at the recall position with the intermediate results of this CPU-intensive task to calculate the final hash.

As the Flow data grows to the terabyte (TB) level or higher, a single node may be unable to store all the data. If many recall positions are not stored by the node, PoRA will shift from challenging data retrieval to challenging CPU capacity, which contradicts the design's original intent.

Recall Range

To address this, PoRA allows users to specify a recall range, which influences the calculation of the recall position. The miner can specify the mining range with two parameters:

  • startPosition The index of the start sector.

  • miningLength The length of sectors for mine range.

  • A 64-bit binary number with wildcard characters, for example: 0011xx001x, where x represents either 0 or 1. This number is specified with two fields

    • shardId: replace all the wildcard with 0

    • shardMask: replace all the wildcard with 1, and others to 0

The shardDegree of a recall range is defined by 2w2^w, where ww refers to the number of wildcard characters.

The mining range should satisfy the following conditions:

  • The startPosition should be in alignment of SECTORS_PER_PRICE

  • The miningLength should not exceed SECTORS_PER_MINE × shardDegree

  • startPosition + miningLength should not exceed the flowLength in the most recent context.

  • miningLength should be no less than min(x, SECTORS_PER_MINE × shardDegree), where xx is the largest multiply of SECTORS_PER_PRICE that is less than the flowLength in the mine context.

Recall Position

Consider a big integer recallSeed calculated from the CPU-intensive task. The recall position is calculated as

originOffset = recallSeed % (miningLength / SECTORS_PER_LOAD)
recallOffset = (originOffset & shardMask) | range.shardId
if recallOffset * SECTORS_PER_LOAD > miningLength:
    /* recall position calculated fails */

recallPosition = startPosition + recallOffset * SECTORS_PER_LOAD;

Iteration of PoRA

  • Pick a random nonce with 32 bytes

  • Get the current mining context with flowLength and contextDigest

  • Specify the mining range and calculated digest as

    rangeDigest = keccak256(startPosition, miningLength, shardId, shardMask)
  • Compute recall position of PoRA chunk with a large Scratchpad ScratchPad

    (recallSeed, scratchpad) = ScratchPadHash(minerId, nonce, contextDigest, rangeDigest)
  • Regard recallSeed as a 256-bit big-endian format integer and calculate recallPosition as described in the previous section

  • Load LOADING_SIZE-bytes sealed data started at recallPosition (in terms of sectors) as sealedData

  • Let mixedData be sealedData xor scratchpad

  • Regard mixedData as an array of SEAL_SIZE-byte elements. For each segment y with index i, compute poraHash and check if it reaches targetQuality.

    ZERO64 = 64 bytes filled with 0
    # The hash function here should be blake2b. 
    poraHash = Hash(i, minerId, nonce, contextDigest, startPosition, 
                    miningRange, ZERO64, y)
    difficultyScale = min(SECTORS_PER_MINE, flowLength) 
                    / min(SECTORS_PER_MINE, flowLength / shardDegree)
    
    if poraHash < poraTarget / difficultyScale:
        Found a valid solution (nonce, i) with respect to 
        (minerId, contextDigest, startPosition, miningRange)

Design of ScratchPadHash in PoRA

# Initialize base_scratchpad with an array of 64-byte word. 
digest = blake2b(minerId, nonce, contextDigest, 
                 startPosition, miningLength) 

for i in range(0, SCRATCHPAD_SIZE / 64):
	  digest = blake2b(digest)
		base_scratchpad[i] = digest

recallSeed = keccak256(digest)

scratchpad = repeat base_scratchpad in LOADING_SIZE / SCRATCHPAD_SIZE times

return recallSeed, scratchpad

Difficulty Adjustment

Once a MineContext is released at a specific block number, TARGET_SUBMITS valid solutions are expected to be submitted in exactly TARGET_BLOCKS blocks. Otherwise the poraTarget will be adjusted in the following steps.

  • Decide the actualBlocks blocks as the actual time taken to submit TARGET_SUBMITS solutions. If there are not enough solutions until the next Mine Context release, the blocks between two context releases is defined as actualBlocks.

  • Adjust the poraTarget

    expectedPoraTarget = poraTarget * actualBlocks / TARGET_BLOCKS
    poraTarget += (expectedPoraTarget - poraTarget) / ADJUST_RATIO

Economic Model

List of Parameters

Constant parameters

ParameterRequirementDefault value

DATA_LIFETIME

3 years

Admin adjustable parameters

Pricing

When users submit data, they need to pay a fee based on the size of the submission, calculated as SECTOR_PRICE × <number of sectors in the submission>. The zeroes padded after the original data to meet the submission requirements are also subject to fees. However, zeroes padded before the original data by the storage contract to achieve alignment requirements do not incur any fees.

Although the pre-padded zeroes do not incur fees, the fees paid by the user will be evenly distributed across sectors, including the original data and all the zero padded part.

Reward Bucket

Every PRICING_SIZE bytes of data in the Flow form a pricing chunk, which has a reward bucket, collecting the fees for each sector within this chunk. When a pricing chunk is filled with data, its rewards in that buckets are released linearly over a period defined by DATA_LIFETIME.

When PoRA's recallPosition hits a pricing chunk, it will take half of the "remaining released rewards.”

System Rewards

In the early stages of the ecosystem, the foundation can reserve a portion of tokens for system rewards. When PoRA’s recallPosition hits a pricing chunk that is "releasing rewards", an additional reward of BASE_REWARD will be issued.

The funds for the base reward will be manually deposited into the reward contract and tracked separately. If the balance for the base reward is insufficient to cover a signle base reward, miners will not be able to receive the full base reward.

Service Fee

A system service fee is charged as a proportion of the storage fees paid by the user, according to the parameter SERVICE_FEE_RATE_BP.

Last updated