Transformations

Top  Previous  Next

Fathom Reference > Fathom Operators, Functions, and Units > Statistics Functions > Transformations

bin

This function helps recode numeric data to categorical by allowing you to define histogram-like bins into which numeric values will be recoded. It takes the form bin(a, bin, min, max) where a = attribute, bin = bin width, min = start of bin 1, and max = end. bin gives you a string (category value) for a—its “bin” as defined by the other arguments. For example, bin(3.14, 2, 0, 10) gives “b02” because the value (3.14) is in bin #2 in [0, 10] with bins of width 2. (The last two arguments are optional.)

bootstrap

The bootstrap function performs sampling with replacement. It returns a set of output values, each of which is chosen at random from its input values with replacement. Rerandomizing produces a different set of output values. For example, bootstrap(pages) where pages is an attribute containing values {6,7,8,9} might return {7,9,7,6} or {6,9,8,7}.

next

The value for the next case. If this is the last case, next returns 0. For example, next(year) returns, for each case, the value of the next year. As with prev (see below), next takes an optional second argument that specifies the value to be returned for the last case. If a third argument is present, it is treated as a filter; for example, next(height,0,Sex="f") returns the height of the next female, 0 if none.

popZScore

Returns the number of population standard deviations a value is from the mean. For example, popZScore(finalExam) computes a standard score for each value of the attribute finalExam.

prev

The value for the previous case. If this case is the first case, prev returns 0. For example, prev(year) returns, for each case, the value of the previous year. A second, optional argument allows you to specify the value that prev should take if there is no previous case. For example, prev(Factor, 1) will return the previous value of Factor for all cases except the first, for which it returns 1. If a third argument is present, it is treated as a filter; for example, prev(numberInLine,0,Flavor="strawberry") returns the closest previous value of numberInLine for which Flavor has the value “strawberry”.

rank

Returns the position of the value when cases are ordered from lowest to highest. For ex­ample, rank(Population) used as an attribute in a collection of states assigns to each state its rank according to population. Note that if there are duplicate values, the rank will be fractional and the same for all the values. See also uniqueRank.

runLength

Gives the number of identical values immediately prior to and including the current value. For example, if flip contained {H, H, H, T, H, T, T}, this example would return {1, 2, 3, 1, 1, 1, 2}. You could use max(runLength(flip)) to com­pute the longest streak of heads or tails in a coin-flipping simulation.

sampleZScore
zScore

Returns the number of sample standard deviations a value is from the mean. For example, sampleZScore(height) computes a standard score for each value of the attribute height. Use this function in preference to popZScore when you are working with a sample of a population and do not know the true population standard deviation.

scramble

The scramble function performs sampling without replacement. It returns a set of output values, each of which is chosen at random from its input values without replacement. This has the effect of randomizing the order of a set of values. Rerandomizing produces a different set of output values. For example, scramble(pages) where pages is an attribute containing values {6,7,8,9} might return {8,6,9,7} or {9,7,8,6}.

uniqueRank

Returns the unique position of a value in a list of values sorted from smallest to largest. Each value in the list gets assigned a different rank, even if there are duplicate values. For example, if attribute N contains the values {1, 2, 3, 2}, an attribute using the expression uniqueRank(N) will have values {1, 2, 4, 3}. See also rank.