Product: TIBCO Spotfire®
K-means Clustering used for organizing Line Charts -- example data set that meets the required conditions
K-means Clustering used for organizing Line Charts -- is there an example data set that meets the required conditions?
(1) Both the data table and the Line Chart need to meet several conditions, for it to be possible to apply K-means clustering. The following are some of the high-level requirements:
- K-means clustering in TIBCO Spotfire is based on a Line Chart visualization that has already been set up so that (a) each line corresponds to one row in the root view of the data table, or (b) if the line chart is aggregated, there is a one-to-many mapping between lines in the Line Chart and rows in the data table's root view. (The "root view" is the display shown in a Table visualization.)
- This line chart needs to be set up according to the steps listed in the Spotfire help topic whose title is "How to Perform a Line Similarity Comparison".
The following prerequisites that the Line Chart must meet (for K-means Clustering to work) are listed at the top of the "How to Perform a Line Similarity Comparison" help topic:
--- you cannot use multiple Y-axis scales
--- you cannot use an X-axis that is both continuous and binned
--- all lines in the line chart must have the same start points on the X-axis, and the same end points.
(2) The first attached example file (filename "K-means Clustering - testDF3.dxp") uses a test data set (with columns [Name], [Day] and [Oil]) that meets the requirements for a Line Chart whose lines can be grouped using K-means Clustering.
For each of the lines, the [Day] column has a minimum value of 0 and a maximum value of 517, as shown in the right-hand Cross Table on the file's second page.
The first Line Chart page shows the ungrouped data, and the second Line Chart page shows the data as grouped using "Tools > K-means Clustering..." and selection of "9 or fewer groups".
(3) The second attached example file (filename "Pad timeline data for K-means use.dxp") uses a custom TIBCO Enterprise Runtime for R (TERR) script in an example Spotfire data function to pad the endpoints of each "too short" timeline's x-axis, making all of the timelines eligible to be used in K-means clustering of the data table's Line Chart.
The "PadTimelines" data function accepts the "testDF4" data table as input. The "testDF4" data table has many timelines whose x-axis ranges need such padding.
The data function generates "testDF5", a data table (with columns [Name], [Day] and [Oil]) that meets the requirements for a Line Chart whose lines can be grouped using K-means Clustering.
The following is the example TERR script used in the "PadTimelines" data function:
# An example script to make x-axis endpoints the same for all timelines:
NameColumn <- "Name"
xAxisColumn <- "Day"
yAxisColumn <- "Oil"
InDf <- InputDf[ , c(NameColumn, xAxisColumn, yAxisColumn) ]
InDfMinXaxis <- min(InDf[, xAxisColumn])
InDfMaxXaxis <- max(InDf[, xAxisColumn])
X = split( InDf, InDf[, NameColumn]),
FUN = function(X, oMinXaxis, oMaxXaxis)
xName <- unique(X[, NameColumn])
if( min(X[, xAxisColumn]) > oMinXaxis )
xMinDF <- data.frame(Name = xName, Xaxis = oMinXaxis, Yaxis = 0., stringsAsFactors = F)
colnames(xMinDF) <- colnames(X)
X <- rbind(xMinDF, X)
if( max(X[, xAxisColumn]) < oMaxXaxis )
xMaxDF <- data.frame(Name = xName, Xaxis = oMaxXaxis, Yaxis = 0., stringsAsFactors = F)
colnames(xMaxDF) <- colnames(X)
X <- rbind(X, xMaxDF)
}, oMinXaxis = InDfMinXaxis, oMaxXaxis = InDfMaxXaxis ) )
The code in this article is only a sample to be used as a reference. It is not intended to be used "As Is" in a Production environment. Always test in a Development environment. Make modifications to the code in accordance with your implementation specifications that best suit your business requirements. Refer to the reference(s) cited in this article for usage of the functions and methods used in the code.