8 Transformed Age
When a new block is mined, the distribution of the decoy selection changes a small amount due to several factors:
Outputs in a specific block all have equal probability of being selected, creating a step-function effect. Blocks with many outputs “flatten” the distribution. As those large blocks get older compared to the most recently-mined block, they distort the smooth Gamma density.
Timelocked outputs, especially coinbase outputs, reach their spendable age as new blocks are mined. When timelocked outputs are clustered together, as is the case with P2Pool coinbase outputs, the distribution of spendable outputs can differ at different times.
The decoy selection distribution has a parameter measuring the rate of new outputs being produced per unit of time. When on-chain transaction volume changes, the parameter changes and so does the decoy selection distribution.
Every new spendable output that is added to the blockchain increases the size of the probability distribution’s support. The oldest spendable output becomes a little older.
Over the course of a week of data, these factors can produce distributions with differences important enough to matter.
A sample of a random variable that is transformed by the random variable’s cumulative distribution function (CDF) will be distributed as a Uniform(0,1) variable. I use this fact to standardize the different decoy selection distributions.
Let be the CDF of the decoy selection distribution. Let be the set of ring member ages of transactions that were constructed at a specific block height. Then, for each , is the transformed value of .
There is another complication. In 2023, an off-by-one-block bug was discovered in the wallet2
reference wallet implementation. The bug was patched in version 0.18.2.2 of wallet2
, released on April 4, 2023, but it was not a required update. Therefore, users who did not update to the latest software version would be using the old implementation with the off-by-one bug. The simultaneous use of two distributions creates a mixture distribution. In the code below, this problem is handled by computing the old and new versions of the decoy selection distribution and giving them weights. Estimating the share of users who were using the old and new software version was not feasible, so the mixing proportions were guessed. It was assumed that adoption followed a straight linear trend, starting at zero percent in April 2023 and ending at 75 percent in October 2024, the end of the dataset.
The code in this chapter takes about two weeks to finish on a powerful machine. For each of the half-a-million blocks in the dataset, we must compute the probability mass for each of Monero’s 100 million on-chain RingCT outputs (50 trillion probability masses in total).
The code should be run in the same R session as before. This is the last script to be run in this session since it will save the relevant data to storage, to be loaded into a new session in the next chapter. Some of the code is based on a python implementation of Monero’s decoy selection algorithm written by jeffro256.
The user can adjust these variables at the beginning of the script:
threads
. Number of CPU threads to use. Set to 16 by default.url.rpc
. URL of the Monero node’s RPC.http://127.0.0.1:18081
by default.cdf.dir
,ring.member.ages.dir
, andaggregate.cdf.dir
. The names of directories that will be created to hold R data objects in the Quick Serialization.qs
data format.
8.1 Code
<- 16
threads ::plan(future::multisession(workers = threads))
future
<- "http://127.0.0.1:18081"
url.rpc
<- "weekly-weighted-cdf"
cdf.dir <- "weekly-ring-member-ages"
ring.member.ages.dir <- "aggregate-weighted-cdf"
aggregate.cdf.dir
dir.create(cdf.dir)
dir.create(ring.member.ages.dir)
dir.create(aggregate.cdf.dir)
<- function(crod) {
calculate_num_usable_rct_outputs
= 10
CRYPTONOTE_DEFAULT_TX_SPENDABLE_AGE
# 1
= length(crod) - (CRYPTONOTE_DEFAULT_TX_SPENDABLE_AGE - 1)
num_usable_crod_blocks
# 2
= crod[num_usable_crod_blocks] # R indexes from 1
num_usable_rct_outputs
return(num_usable_rct_outputs)
}
<- function(x, block.height, crod, start_height.RingCT, output.index.only.locked, OBOE.prop) {
dsa.to.uniform # dsa.to.uniform <- function(x, block.height, crod, output.index.only.locked) {
= 10
CRYPTONOTE_DEFAULT_TX_SPENDABLE_AGE = 120
DIFFICULTY_TARGET_V2
= 60 * 60 * 24 * 365
SECONDS_IN_A_YEAR = SECONDS_IN_A_YEAR / DIFFICULTY_TARGET_V2
BLOCKS_IN_A_YEAR
<- function(crod) {
calculate_average_output_flow # 1
= min(c(length(crod), BLOCKS_IN_A_YEAR))
num_blocks_to_consider_for_flow
# 2
if (length(crod) > num_blocks_to_consider_for_flow) {
= crod[length(crod)] - crod[ length(crod) - num_blocks_to_consider_for_flow ]
num_outputs_to_consider_for_flow # R indexes from 1
else {
} = crod[length(crod)] # R indexes from 1
num_outputs_to_consider_for_flow
}
# 3
= DIFFICULTY_TARGET_V2 * num_blocks_to_consider_for_flow / num_outputs_to_consider_for_flow
average_output_flow
return(average_output_flow)
}
<- function(crod) {
calculate_num_usable_rct_outputs # 1
= length(crod) - (CRYPTONOTE_DEFAULT_TX_SPENDABLE_AGE - 1)
num_usable_crod_blocks
# 2
= crod[num_usable_crod_blocks] # R indexes from 1
num_usable_rct_outputs
return(num_usable_rct_outputs)
}
# crod <- crod[1:(block.height - start_height.RingCT + 1)]
<- crod[1:(block.height - start_height.RingCT)]
crod.at.tx.construction
<- calculate_average_output_flow(crod.at.tx.construction)
average_output_flow <- calculate_num_usable_rct_outputs(crod.at.tx.construction)
num_usable_rct_outputs
<- average_output_flow
v <- num_usable_rct_outputs
z
= 19.28
GAMMA_SHAPE = 1.61
GAMMA_RATE
= 10
CRYPTONOTE_DEFAULT_TX_SPENDABLE_AGE = 120
DIFFICULTY_TARGET_V2 = CRYPTONOTE_DEFAULT_TX_SPENDABLE_AGE * DIFFICULTY_TARGET_V2
DEFAULT_UNLOCK_TIME = 15 * DIFFICULTY_TARGET_V2
RECENT_SPEND_WINDOW
<- function(x) {
G ::plgamma(x, shapelog = GAMMA_SHAPE, ratelog = GAMMA_RATE)
actuar
}
<- function(x) {
G_star 0 <= x*v & x*v <= 1800) *
(G(x*v + 1200) - G(1200) +
(*v)/(1800) ) * G(1200)
( (x/G(z*v) +
)1800 < x*v & x*v <= z*v) * G(x*v + 1200)/G(z*v) +
(*v < x*v) * 1
(z
}
<- abs(diff(rev(c(0, crod.at.tx.construction))))[-(1:(CRYPTONOTE_DEFAULT_TX_SPENDABLE_AGE-1))]
crod.reversed # Remove first 10 blocks before cumsum() since can't spend from those outputs
<- output.index.only.locked[
unspendable.indices $output_index + 1L) <= as.integer(num_usable_rct_outputs) &
(output.index.only.locked$output_unlock_time >= as.integer(block.height),
output.index.only.locked"output_index"]
stopifnot(all(unspendable.indices < num_usable_rct_outputs))
<- abs(unspendable.indices - num_usable_rct_outputs) + 1
unspendable.indices
<- crod.reversed
crod.reversed.for.OBOE
#if (OBOE.prop != 1) {
if (TRUE) {
<- cumsum(crod.reversed[crod.reversed > 0])
crod.reversed
<- c(0, crod.reversed)
crod.reversed # Concatenate zero since the CDF should start at 0
<- crod.reversed[-length(crod.reversed)] + 1
y_0 <- crod.reversed[-1]
y_1 <- (G_star(y_1 + 1) - G_star(y_0)) / (y_1 + 1 - y_0)
dsa.pmf
<- rep(dsa.pmf, times = diff(crod.reversed))
dsa.pmf length(dsa.pmf)] <- dsa.pmf[length(dsa.pmf) - 1]
dsa.pmf[# Replace oldest output's mass by the penultimate one since
# it is calculated as a negative mass otherwise
# output.index.only.locked <- copy(output.index.only.locked)
# sum.of.unspendable.mass <- sum(dsa.pmf[unspendable.indices])
<- 0
dsa.pmf[unspendable.indices]
# dsa.pmf <- dsa.pmf[ support.endpoints[1]:support.endpoints[2] ]
<- dsa.pmf
dsa.pmf.correct # dsa.pmf.correct <- dsa.pmf/sum(dsa.pmf)
# dsa.pmf <- dsa.pmf * 1/(1 - sum.of.unspendable.mass)
# dsa.cdf.correct <- cumsum(dsa.pmf)
}
<- crod.reversed.for.OBOE
crod.reversed
<- sum(crod.reversed[1])
first.post.unlock.block # NEW ^
<- crod.reversed[-(1)]
crod.reversed # NEW ^
<- cumsum(crod.reversed[crod.reversed > 0])
crod.reversed
<- c(0, crod.reversed)
crod.reversed # Concatenate zero since the CDF should start at 0
<- crod.reversed[-length(crod.reversed)] + 1
y_0 <- crod.reversed[-1]
y_1 <- (G_star(y_1 + 1) - G_star(y_0)) / (y_1 + 1 - y_0)
dsa.pmf
<- rep(dsa.pmf, times = diff(crod.reversed))
dsa.pmf length(dsa.pmf)] <- dsa.pmf[length(dsa.pmf) - 1]
dsa.pmf[
<- c(rep(0, first.post.unlock.block), dsa.pmf)
dsa.pmf
<- 0
dsa.pmf[unspendable.indices]
# dsa.pmf <- dsa.pmf[ support.endpoints[1]:support.endpoints[2] ]
#if (OBOE.prop != 1) {
if (TRUE) {
<- dsa.pmf
dsa.pmf.OBOE
# gc()
<- OBOE.prop * dsa.pmf.OBOE + (1 - OBOE.prop) * dsa.pmf.correct
dsa.pmf rm(dsa.pmf.OBOE, dsa.pmf.correct)
}
<- dsa.pmf / sum(dsa.pmf)
dsa.pmf
<- cumsum(dsa.pmf)
dsa.cdf
return(list(transformed.data = dsa.cdf[x], pmf.contribution = dsa.pmf, v = v, z = z))
}
# closeAllConnections()
# https://github.com/monero-project/monero/releases/tag/v0.18.2.2
# iso week 15 starts on April 10, 2023
<- intersect(iso.weeks, c(paste0("2023-", 15:52), paste0("2024-", formatC(1:52, width = 2, flag = "0"))))
OBOE.est.weeks <- sum(rev(iso.weeks) %in% OBOE.est.weeks)
n.weeks.transition <- 0.25
final.state
<- rep(1 - 0.02, length(rev(iso.weeks)))
OBOE.prop.weekly rev(iso.weeks) %in% OBOE.est.weeks] <- final.state + (1 - final.state) * (seq_len(n.weeks.transition) - 1) / n.weeks.transition
OBOE.prop.weekly[names(OBOE.prop.weekly) <- rev(iso.weeks)
<- xmr.rpc(url.rpc = url.rpc, method = "get_output_distribution",
latest.crod params = list(amounts = list(0), from_height = 0, to_height = current.height,
binary = FALSE, cumulative = TRUE))
<- latest.crod$result$distributions[[1]]$start_height
start_height.RingCT <- latest.crod$result$distributions[[1]]$distribution
latest.crod
# iso.weeks <- sort(iso.weeks)
# for (i in setdiff(rev(iso.weeks), OBOE.est.weeks)) {
for (i in rev(iso.weeks)) {
# Compute the most recent week first so the support of all.weeks.weighted.pmf will have
# the maximum number of output indices
if (i %in% weeks.missing.mempool.data) { next }
<- OBOE.prop.weekly[i]
OBOE.prop
<- xmr.rings[
xmr.rings.week == i,
block_timestamp_ring_isoweek
.(block_height_ring, tx_hash, input_num, ring_member_age, is.nonstandard.rucknium, is.nonstandard.isthmus)]
setorder(xmr.rings.week, -block_height_ring, input_num)
# TODO: Do I need to order by input_num too?
<- calculate_num_usable_rct_outputs(
num_usable_rct_outputs.max 1:(xmr.rings.week[1, block_height_ring] - start_height.RingCT)])
latest.crod[
# split.ring.member.age <- split(xmr.rings.week[, ring_member_age + 1], xmr.rings.week[, block_height_ring])
<- split(xmr.rings.week[, .(block_height_ring, ring_member_age)],
split.ring.member.age cut(xmr.rings.week[, block_height_ring], breaks = threads))
<- copy(split.ring.member.age)
split.ring.member.age for (j in seq_along(split.ring.member.age)) {
<- split(split.ring.member.age[[j]][, ring_member_age], split.ring.member.age[[j]][, block_height_ring])
split.ring.member.age[[j]]
}<- copy(split.ring.member.age)
split.ring.member.age
<- future.apply::future_lapply(split.ring.member.age, function(block.chunk) {
uniformed.ring_member_age
<- list()
transformed.data <- vector("numeric", num_usable_rct_outputs.max)
weighted.pmf <- 0
n.ring.members.total <- 0
v.mean <- 0
z.mean <- 0
z.max
for (block.height in names(block.chunk)) {
<- length(block.chunk[[block.height]])
n.ring.members
<- unique(block.chunk[[block.height]])
block.chunk[[block.height]] # Take unique() to avoid some duplicate computations and
# because the data.table will be merged later with xmr.rings.week
<- dsa.to.uniform(block.chunk[[block.height]],
inner.loop.results as.numeric(block.height), latest.crod, start_height.RingCT, output.index.only.locked, OBOE.prop)
<- inner.loop.results$transformed.data
transformed.data[[block.height]] # block.height is character data type here
if (length(inner.loop.results$pmf.contribution) != num_usable_rct_outputs.max) {
$pmf.contribution <- c(inner.loop.results$pmf.contribution,
inner.loop.resultsrep(0, num_usable_rct_outputs.max - length(inner.loop.results$pmf.contribution)))
}
<- weighted.pmf + inner.loop.results$pmf.contribution * n.ring.members
weighted.pmf
<- n.ring.members.total + n.ring.members
n.ring.members.total
<- v.mean + inner.loop.results$v * n.ring.members
v.mean
<- z.mean + inner.loop.results$z * n.ring.members
z.mean
<- max(c(z.max, inner.loop.results$z))
z.max
}
list(
transformed.data = data.table(
block_height_ring = rep(as.integer(names(block.chunk)), times = lengths(transformed.data)),
ring_member_age = unlist(block.chunk, use.names = FALSE),
ring_member_age.transformed = unlist(transformed.data, use.names = FALSE)),
weighted.pmf = weighted.pmf, n.ring.members.total = n.ring.members.total,
v.mean = v.mean, z.mean = z.mean, z.max = z.max)
future.globals = c("dsa.to.uniform", "split.ring.member.age", "latest.crod", "output.index.only.locked", "OBOE.prop",
}, "start_height.RingCT", "calculate_average_output_flow", "calculate_num_usable_rct_outputs", "actuar::plgamma",
"num_usable_rct_outputs.max"),
future.packages = "data.table")
<- uniformed.ring_member_age[[1]]$weighted.pmf
weighted.pmf 1]]$weighted.pmf <- NULL
uniformed.ring_member_age[[
for (j in seq_len(length(uniformed.ring_member_age) - 1)) {
<- weighted.pmf + uniformed.ring_member_age[[j + 1]]$weighted.pmf
weighted.pmf + 1]]$weighted.pmf <- NULL
uniformed.ring_member_age[[j
}
<- sum(sapply(uniformed.ring_member_age, FUN = function(x) {x$n.ring.members.total}))
n.ring.members.weekly.total
<- weighted.pmf / n.ring.members.weekly.total
weighted.pmf <- sum(sapply(uniformed.ring_member_age, FUN = function(x) {x$v.mean})) / n.ring.members.weekly.total
weighted.v.mean <- sum(sapply(uniformed.ring_member_age, FUN = function(x) {x$z.mean})) / n.ring.members.weekly.total
weighted.z.mean <- max(sapply(uniformed.ring_member_age, FUN = function(x) {x$z.max}))
weekly.z.max <- cumsum(weighted.pmf)
weekly.weighted.cdf
if (i == rev(iso.weeks)[1]) {
<- weighted.pmf * n.ring.members.weekly.total
all.weeks.weighted.pmf <- n.ring.members.weekly.total
all.weeks.n.ring.members <- weighted.v.mean * n.ring.members.weekly.total
all.weeks.weighted.v.mean <- weighted.z.mean * n.ring.members.weekly.total
all.weeks.weighted.z.mean <- weekly.z.max
all.weeks.weekly.z.max else {
} <- all.weeks.weighted.pmf + n.ring.members.weekly.total *
all.weeks.weighted.pmf c(weighted.pmf, rep(0, length(all.weeks.weighted.pmf) - length(weighted.pmf)))
<- all.weeks.n.ring.members + n.ring.members.weekly.total
all.weeks.n.ring.members <- all.weeks.weighted.v.mean + weighted.v.mean * n.ring.members.weekly.total
all.weeks.weighted.v.mean <- all.weeks.weighted.z.mean + weighted.z.mean * n.ring.members.weekly.total
all.weeks.weighted.z.mean <- max(c(all.weeks.weekly.z.max, weekly.z.max))
all.weeks.weekly.z.max
}
rm(weighted.pmf)
<- list(list(weekly.weighted.cdf = weekly.weighted.cdf,
weekly.weighted.cdf weighted.v.mean = weighted.v.mean, weighted.z.mean = weighted.z.mean, weekly.z.max = weekly.z.max))
names(weekly.weighted.cdf) <- i
::qsave(weekly.weighted.cdf, file = paste0(cdf.dir, i, ".qs") )
qs# saveRDS(weekly.weighted.cdf, file = paste0(cdf.dir, i, ".rds") )
rm(weekly.weighted.cdf)
<- data.table::rbindlist(lapply(uniformed.ring_member_age, FUN = function(x) {x$transformed.data}))
uniformed.ring_member_age
stopifnot(uniqueN(uniformed.ring_member_age ) == nrow(uniformed.ring_member_age ))
<- merge(xmr.rings.week, uniformed.ring_member_age, by = c("block_height_ring", "ring_member_age"))
xmr.rings.week
rm(uniformed.ring_member_age)
setorder(xmr.rings.week, tx_hash, input_num)
:= rep(1:16, nrow(xmr.rings.week)/16)]
xmr.rings.week[, ring_mem_id
<- data.table::dcast(xmr.rings.week,
xmr.rings.week + input_num + is.nonstandard.rucknium + is.nonstandard.isthmus ~ ring_mem_id, value.var = "ring_member_age.transformed")
tx_hash
<- xmr.rings.week[, c("is.nonstandard.rucknium", "is.nonstandard.isthmus")]
nonstandard.labels
set.seed(314)
<- apply(xmr.rings.week, 1, FUN = function(x) {
xmr.rings.week sample(x[(length(x) - 15):length(x)])
})
mode(xmr.rings.week) <- "numeric"
<- t(xmr.rings.week)
xmr.rings.week
<- cbind(nonstandard.labels, xmr.rings.week)
xmr.rings.week
colnames(xmr.rings.week) <- c("is.nonstandard.rucknium" , "is.nonstandard.isthmus",
paste0("ring.member.", formatC(1:16, width = 2, flag = "0")) )
<- list(xmr.rings.week)
xmr.rings.week names(xmr.rings.week) <- i
::qsave(xmr.rings.week, file = paste0(ring.member.ages.dir, i, ".qs") )
qs# saveRDS(xmr.rings.week, file = paste0(ring.member.ages.dir, i, ".rds") )
# xmr.ring.member.ages
cat(i, base::date(), "\n")
}
# TODO: Do I need these objects?
<- cumsum(all.weeks.weighted.pmf / all.weeks.n.ring.members)
all.weeks.weighted.cdf <- all.weeks.weighted.v.mean / all.weeks.n.ring.members
all.weeks.weighted.v.mean <- all.weeks.weighted.z.mean / all.weeks.n.ring.members
all.weeks.weighted.z.mean # all.weeks.weekly.z.max already has the max
::qsave(list(all.weeks.weighted.cdf = all.weeks.weighted.cdf,
qsall.weeks.weighted.v.mean = all.weeks.weighted.v.mean,
all.weeks.weighted.z.mean = all.weeks.weighted.z.mean,
all.weeks.weekly.z.max = all.weeks.weekly.z.max),
file = paste0(aggregate.cdf.dir, "aggregate-weeks-cdf.qs"))