t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections
t-SNE Grid Search - Tip: t-SNE grid search with 25 most diverse projections extracted from 500 generated embeddings (using Procrustes distance).
Quality Metrics Average (QMA), Neighborhood Hit (NH), Trustworthiness (T), Continuity (C), Stress (S), Shepard Diagram Correlation (SDC)
<option value="1" selected>Quality Metrics Average (QMA)</option>
<option value="2">Neighborhood Hit (NH)</option>
<option value="3">Trustworthiness (T)</option>
<option value="4">Continuity (C)</option>
<option value="5">Stress (S)</option>
<option value="6">Shepard Diagram Correlation (SDC)</option>
<button id="confirmModal" class="w3-button w3-left w3-white w3-border" style="margin-top: -3px; margin-bottom: -3px" onclick="ReSort(true)" disabled>Confirm</button>
<button id="closeModal" class="w3-button w3-right w3-white w3-border" style="margin-top: -3px; margin-bottom: -3px" onclick="closeModalFun()">Close</button>
Parameters - Tip: a panel for controlling the t-SNE algorithm and its parameters. There is also an option to choose between grid parameter search and single set mode.
Tip: the option of changing between grid search (generating 500 projections) and a single set of parameters (1 projection). Grid Search or Single Set
<option value="1" selected>Grid Search</option>
<option value="2">Single Set</option>
Tip: the overall cost reduced by each iteration step of the t-SNE algorithm.
Data sets - Tip: use one of the data sets already provided (only numerical values supported) or upload a new file (do not forget to use * for the target label). Options: Diabetes, Breast Cancer, Iris, SPECTF, Gaussian Clusters, Upload File
<select id="param-dataset" name="param-dataset" onChange="changeDataset(this.value);">
<option value="diabetes.csv" selected>Diabetes</option>
<option value="breast-cancer-wisconsin.csv">Breast Cancer</option>
<option value="iris.csv">Iris</option>
<option value="SPECTF.csv">SPECTF</option>
<option value="blobs.csv">Gaussian Clusters</option>
<option value="empty">Upload File</option>
Factory reset - Tip: Restart the entire web page/application.
Perplexity - Tip: perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. (Source: https://lvdmaaten.github.io/tsne/).
<td><input id="param-perplexity" type="range" min="5" max="100" value="30", step="1" ></td>
<td><output for="param-perplexity" id="param-perplexity-value">30</output></td>
Learning rate - Tip: if the learning rate is too high, the data may look like a 'ball' with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. (Source: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).
<td><input id="param-learningrate" type="range" min="1" max="150" value="1", step="1"></td>
<td><output for="param-learningrate" id="param-learningrate-value">1</output></td>
Max iterations - Tip: maximum number of iterations for the optimization. Should usually be around 250. (Source: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).
<td><input id="param-maxiter" type="range" min="10" max="1000" value="500", step="10"></td>
<td><output for="param-maxiter" id="param-maxiter-value" >500</output></td>
Load execution - Tip: load previously executed analysis in .txt format.
Cache distances - Tip: if you store distances the file size will be larger but on a loading of this execution it will be processed much quicker than without this option enabled.
<input id="downloadDists" checked type="checkbox" >
Cache distances
<td style="padding-top: 0.8vh !important">
Store execution - Tip: save/store previously executed analysis in .txt format.
Execute new t-SNE analysis - Tip: initialize a new t-SNE investigation or start a previous analysis, in case load execution is activated.
Projections Provenance - Tip: a feature of this tool that supports clusters (and points) exploration. Checking the neighborhood preservation between the entire projection's average and a selection driven by the user. You can also find the best projections based on a lasso selection of points (with optimize selection).
<div id="textToChange" style="display:inline-block">[Sorting:</div>
<select id="param-SortM-view" name="param-SortM-view" onchange="ReSort(false)">
<option value="1" selected>Quality Metrics Average (QMA)</option>
<option value="2">Neighborhood Hit (NH)</option>
<option value="3">Trustworthiness (T)</option>
<option value="4">Continuity (C)</option>
<option value="5">Stress (S)</option>
<option value="6">Shepard Diagram Correlation (SDC)</option>
<div style="display:inline-block; float:right">
Optimize Selection - Tip: find the best projections for the selected points
Neighborhood Preservation - Tip: a feature of this tool that supports clusters (and points) exploration. Checking the neighborhood preservation between the entire projection's average and a selection driven by the user.
Bar Chart, Difference Bar Chart, Line Plot, Difference Line Plot
<option value="1" selected>Bar Chart</option>
<option value="2">Difference Bar Chart</option>
<option value="3">Line Plot</option>
<option value="4">Difference Line Plot</option>
<div id="knnBarChartDetails"style="display:inline-block; float:right">
Visual Mapping - Tip: in this panel the user can adapt the visual encodings of the main visualization view. Furthermore, the dimension correlation capturing points thresholds are situated in this panel. For the main view, there is also an annotation functionality available.
Density - Tip: density in the high-dimensional space taken from the t-SNE itself. Options: Color or Size
<select id="param-neighborHood" name ="param-neighborHood" onchange="setReInitialize(true);">
<option selected="selected" value="color">Color</option>
<option value="size">Size</option>
Remaining cost - Tip: remaining cost of each point throughout the entire projection.
Size - Tip: change between size/radius and color encodings.
Correlation - Tip: adapt the selection of points in the two-dimensional space. The options are a simple distance measurement between point and line or using the KNN algorithm. Options: Distance or KNN
<select id="param-correlationMeasur" name ="param-correlationMeasur" onchange="setReInitializeDistanceCorrelation(true);">
<option selected="selected" value="1">Distance</option>
<option value="2">KNN</option>
<td scope="row">
Correlation threshold (%) - Tip: percentage of all points taken into account by Dimension Correlation.
K-value nearest neighbor - Tip: K-value for nearest neighbor algorithm.
<input id="param-corr" type="range" min="0" max="100" value="50", step="1" onchange="CalculateCorrel(true, 1);">
<input id="param-corr2" type="range" min="1" max="250" value="10", step="1" onchange="CalculateCorrel(true, 2);" style="display: none">
<output for="param-corr" id="param-corr-value">50</output>
<output for="param-corr2" id="param-corr-value2" style="display: none">10</output>
Point radius scaling - Tip: x*times the actual radius (increases/decreases points' radius).
<td><input id="param-lim" type="range" min="1" max="4" value="3", step="0.5" onchange="setReInitialize(false);"></td>
<td><output for="param-lim" id="param-lim-value">3</output></td>
<td scope="row">
Disable annotator
Disable annotator</input>
Erase annotations
Reveal annotations
<td scope="row">
Write a comment.
Attach comment
Analysis
Overview - Tip: t-SNE overview with or without labels depending on each data set. To determine the target label set a * mark after the appropriate dimension. The number of dimensions/features and instances of a data set.
Interaction Modes (M) - Tip: various functionalities depending on the user. These modes enable different interactions in the main visualization view.
Points exploration - Tip: in this mode the user can zoom in and out in the main visualization view and when hovering on a particular point he/she receives the exact data set's instance dimensions.
Group selection - Tip: lasso selection in the main visualization view.
Dimension correlation - Tip: draw a shape (polylines) and check the related dimensions correlations for your drawing/shape. With the left click you set one point and the right click you confirm the drawing for further analysis.
Reset all filters - Tip: reset all filters applied in the visualizations without losing the execution.
Shepard Heatmap - Tip: a view related to the overall quality of the projection. If the points/values belong to the diagonal, then the distances are preserved in both spaces. If values are closer to N-D distances, then the visualization is too compressed. If values are closer to 2-D distances, then the visualization is too spread out.
Heatmap or Diagram
<option value="1" selected>Heatmap</option>
<option value="2">Diagram</option>
Dimension Correlation - Tip: user-driven shape investigation of the most correlated dimensions.
Min Correlation - Tip: the minimum acceptable visible correlation. Default is 0, so the tool accepts all the correlations. Minimum visible correlation (range: 0.0 to 1.0).
<input id="param-corlim" type="range" min="0" max="1" value="0.0", step="0.1" style="display:inline-block; float:right" onchange="CalculateCorrel(true);">
Density and Remaining Cost - Tip: the density and remaining cost distributions are important to look at along with the main visualization view individual values.
Min Cost - Tip: set the rate of the limiter for the minimum acceptable visible cost at the main visualization view. Minimum visible cost rate (range: 0.1 to 1.0).
<input id="param-costlim" type="range" min="0.1" max="1" value="1", step="0.1" style="display:inline-block; float:right" onchange="setReInitialize(false);">
Adaptive Parallel Coordinates Plot - Tip: for every selection the tool runs a local Principal Component Analysis (PCA) algorithm and dynamically adapts and shows the top 8 dimensions in an order from left to right. This sorting from left to right presents the most related (with high variance) features of the data set to the least important (low variance). It also works with local selections of points!
