This Git repository contains the code that accompanies the research paper "t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections". The details of the experiments and the research outcome are described in [the paper](https://doi.org/10.1109/TVCG.2020.2986996).
**Note:** t-viSNE is optimized to work better for the 2560x1440 resolution (1440p/QHD (Quad High Definition)). Any other resolution might need manual adjustment of your browser's zoom level to work properly.
**Note:** t-viSNE is optimized to work better for standard resolutions (such as 1440p/QHD (Quad High Definition) and 1080p). Any other resolution might need manual adjustment of your browser's zoom level to work properly.
**Note:** The tag `paper-version` matches the implementation at the time of the paper's publication. The current version might look significantly different depending on how much time has passed since then.
**Note**: This software is based on the bhtsne library, its native executable and the python interface that is used to call the native executable. This library is the official implementation of t-SNE, made by its authors. Using the exact same input data, different systems will generate slightly different outputs in this library, and such differences will propagate to our software.
**Note:** As any other software, the code is not bug free. There might be limitations in the views and functionalities of the tool that could be addressed in a future code update.
# Data Sets #
All data sets used in the paper are in the `data` folder, formatted as comma separated values (csv).
Most of them are available online from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/index.php): Iris, Breast Cancer Wisconsin (Original), Pima Indians Diabetes, and SPECTF. We also used a custom-made data set with Gaussian clusters.
@ -31,9 +33,8 @@ For the frontend:
There is no need to install anything for the frontend, since all modules are in the repository.
# Usage #
Below is an example of how you can get t-viSNE running using Python for both frontend and backend. The frontend is written in Javascript/HTML, so it could be hosted in any other web server of your preference. The only hard requirement (currently) is that both frontend and backend must be running on the same machine.
Below is an example of how you can get t-viSNE running using Python for both frontend and backend. The frontend is written in JavaScript/HTML, so it could be hosted in any other web server of your preference. The only hard requirement (currently) is that both frontend and backend must be running on the same machine.
```
# first terminal: hosting the visualization side (client)
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: control t-SNE algorithm and its parameters.">t-SNE Parameters</h2>
[Mode:
<selectid="param-EX-view"name="param-EX-view"data-toggle="tooltip"data-placement="right"title="Tip: change between grid search and a single set of parameters."onchange="ExecuteMode()">
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: a panel for controlling the t-SNE algorithm and its parameters. There is also an option to choose between grid parameter search and single set mode.">Parameters</h2>
[M:
<selectid="param-EX-view"name="param-EX-view"data-toggle="tooltip"data-placement="right"title="Tip: the option of changing between grid search (generating 500 projections) and a single set of parameters (1 projection)."onchange="ExecuteMode()">
<divid="cost"title="Tip: the overall cost reduced by each iteration step of the t-SNE algorithm."style="display:inline-block; margin-top:3px; float:right"></div>
</div>
<divclass="panel-body">
<divid="control-panel"data-sr="enter left over 8s">
<divclass="param">
<labelid="data"for="param-dataset"data-toggle="tooltip"data-placement="right"title="Tip: use one of the data sets already provided or upload a new file.">Data sets</label>
<buttontype="button"class="button"id="FactRes"onclick="FactoryReset()"data-toggle="tooltip"data-placement="right"title="Tip: Restart the entire web page/application.">Factory reset</button>
</div>
<divclass="param">
<labelfor="param-perplexity"data-toggle="tooltip"data-placement="right"title="Tip: perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. (Source: https://lvdmaaten.github.io/tsne/)">Perplexity</label>
<labelfor="param-learningrate"data-toggle="tooltip"data-placement="right"title="Tip: if the learning rate is too high, the data may look like a ‘ball’ with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. (Source: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)">Learning rate</label>
<labelfor="param-maxiter"style="padding: 25px 0 0 8px"data-toggle="tooltip"data-placement="right"title="Tip: maximum number of iterations for the optimization. Should usually be around 250. (Source: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)">Max iterations</label>
</div>
<divclass="col-md-4"style="padding: 25px 0 0 0px">
<outputfor="param-maxiter"id="param-maxiter-value"style="padding: 25px 0 0 0">500</output>
</div>
<divclass="col-md-4">
<divid="hider2"></div>
<tableclass="table table-borderless paramTable">
<tbody>
<tr>
<tdscope="row"><labelid="data"for="param-dataset"data-toggle="tooltip"data-placement="right"title="Tip: use one of the data sets already provided (only numerical values supported) or upload a new file (do not forget to use * for the target label).">Data sets</label></td>
<td><buttontype="button"class="button"id="FactRes"onclick="FactoryReset()"data-toggle="tooltip"data-placement="right"title="Tip: Restart the entire web page/application.">Factory reset</button></td>
</tr>
<tr>
<tdscope="row"><labelfor="param-perplexity"data-toggle="tooltip"data-placement="right"title="Tip: perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. (Source: https://lvdmaaten.github.io/tsne/).">Perplexity</label></td>
<tdscope="row"><labelfor="param-learningrate"data-toggle="tooltip"data-placement="right"title="Tip: if the learning rate is too high, the data may look like a ‘ball’ with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. (Source: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).">Learning rate</label></td>
<tdscope="row"><labelfor="param-maxiter"data-toggle="tooltip"data-placement="right"title="Tip: maximum number of iterations for the optimization. Should usually be around 250. (Source: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).">Max iterations</label></td>
<labeldata-toggle="tooltip"data-placement="right"title="Tip: if you store distances the file size will be larger but on a loading of this execution it will be processed much quicker than without this option enabled.">
<buttontype="button"class="button"onclick='loadAnalysis();'data-toggle="tooltip"data-placement="right"title="Tip: load previously executed analysis in .txt format.">Load exec.</button>
</td>
<tdstyle="padding-top: 0.8vh !important"><labeldata-toggle="tooltip"data-placement="right"title="Tip: if you store distances the file size will be larger but on a loading of this execution it will be processed much quicker than without this option enabled.">
<inputid="downloadDists"checkedtype="checkbox">
Cache distances
</label>
</div>
</div>
<divclass="row">
<divclass="col-md-12"style="margin-top:10px">
<p><divid="run-button"><buttonid="ExecuteBut"class="btn btn-primary btn-block"onclick="getData();"value="Execute new t-SNE analysis"style="margin-top: 4px"><iclass="fas fa-running fa-lg"style="margin-right: 10px"></i>Execute new t-SNE analysis</button></div></p>
</div>
</div>
</div>
</td>
<tdstyle="padding-top: 0.8vh !important">
<buttontype="button"class="button"onclick="SaveAnalysis()"data-toggle="tooltip"data-placement="right"title="Tip: save/store previously executed analysis in .txt format.">Store exec.</button>
</td>
</tr>
<tr>
<tdscope="row"colspan="3"><buttonid="ExecuteBut"class="btn btn-primary btn-block"onclick="getData();"title="Tip: initialize a new t-SNE investigation or start a previous analysis, in case load execution is activated."value="Execute new t-SNE analysis"><iclass="fas fa-running fa-lg"></i>Execute new t-SNE analysis</button></td>
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: a feature of this tool that supports clusters (and points) exploration. Checking the neighborhood preservation between the entire projection's average and a selection driven by the user.">Projections Provenance</h2>
<divid="textToChange"style="display:inline-block">[Sorting Projections According to Metric:</div>
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: a feature of this tool that supports clusters (and points) exploration. Checking the neighborhood preservation between the entire projection's average and a selection driven by the user. You can also find the best projections based on a lasso selection of points (with optimize selection).">Projections Provenance</h2>
<optionvalue="1"selected>Quality Metrics Average (QMA)</option>
<optionvalue="2">Neighborhood Hit (NH)</option>
@ -168,9 +166,27 @@
</div>
<divclass="panel-body">
<divid="ProjectionsVisual"></div>
<divid="ProjectionsMetrics"></div>
</div>
</div>
<divclass="panel panel-default med-bottom-neigh">
<divclass="panel-heading">
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: a feature of this tool that supports clusters (and points) exploration. Checking the neighborhood preservation between the entire projection's average and a selection driven by the user.">Neighborhood Preservation </h2>
<h2class="panel-title"data-toggle="tooltip"data-placement="right"title="Tip: various functionalities depending on the user. These modes enable different interactions in the main visualization view.">Interaction Modes</h2>
<buttonclass="btn btn-info active"onclick="setLayerProj();"style="margin-left: -1px !important"><iclass="fas fa-mouse-pointer fa-lg"data-toggle="tooltip"data-placement="right"title="Tip: in this mode the user can zoom in and out in the main visualization view and when hovering on a particular point he/she receives the exact data set's instance dimensions."></i>t-SNE Points Exploration</button>
<buttonclass="btn btn-info"onclick="setLayerComp();"style="margin-left: -1.4px"><iclass="far fa-object-group fa-lg"data-toggle="tooltip"data-placement="right"title="Tip: lasso selection in the main visualization view."></i>Group Selection</button>
<buttonclass="btn btn-info"onclick="setLayerSche();"style="margin-left: -2px !important"><iclass="fas fa-draw-polygon fa-lg"data-toggle="tooltip"data-placement="right"title="Tip: draw a shape (polylines) and check the related dimensions correlations for your drawing/shape. With the left click you set one point and the right click you confirm the drawing for further analysis."></i>Dimension Correlation</button>
</div>
<buttonclass="btn btn-info"onclick="setReset();"style="margin-left: 225px"><iclass="fas fa-trash-alt fa-lg"style="margin-right: 10px"data-toggle="tooltip"data-placement="right"title="Tip: reset all filters applied in the visualizations without losing the execution."></i>Reset Filters</button>
<divclass="panel-heading">
<h2class="panel-title"data-toggle="tooltip"data-placement="right"title="Tip: in this panel the user can adapt the visual encodings of the main visualization view. Furthermore, the dimension correlation capturing points thresholds are situated in this panel. For the main view, there is also an annotation functionality available.">Visual Mapping</h2>
</div>
<divclass="panel-body"id="commBtn">
<tableclass="table table-borderless">
<tbody>
<tr>
<tdscope="row"><labelfor="male"data-toggle="tooltip"data-placement="right"title="Tip: density in the high-dimensional space taken from the t-SNE itself.">Density</label>
<labelfor="male"data-toggle="tooltip"data-placement="right"title="Tip: remaining cost of each point throughout the entire projection.">Remaining cost</label>
<labelid="selectionLabel"data-toggle="tooltip"data-placement="right"title="Tip: change between size/radius and color encodings.">Size</label>
</td>
</tr>
<tr>
<tdscope="row"><labelfor="male"data-toggle="tooltip"data-placement="right"title="Tip: adapt the selection of points in the two-dimensional space. The options are a simple distance measurement between point and line or using the KNN algorithm.">Correl.</label>
<labelfor="param-corr"id="param-corrLabel"data-toggle="tooltip"data-placement="right"title="Tip: percentage of all points taken into account by Dimension Correlation.">Correl. threshold (%)</label>
<tdscope="row"><labelfor="param-lim"data-toggle="tooltip"data-placement="right"title="Tip: x*times the actual radius (increases/decreases points' radius).">Point radius scaling</label></td>
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: t-SNE overview with or without labels depending on each data set. To determine the feature of a data set that corresponds to classes set a * mark after this feature.">t-SNE Overview</h2><divid="datasetDetails"style="display:inline-block; float:right"></div>
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: t-SNE overview with or without labels depending on each data set. To determine the target label set a * mark after the appropriate dimension.">Overview</h2><divid="datasetDetails"title="Tip: the number of dimensions/features and instances of a data set."style="display:inline-block; float:right"></div>
<h2class="panel-title"data-toggle="tooltip"data-placement="right"title="Tip: in this panel the user can adapt the visual mappings of the main visualization view.">Visual Mapping</h2>
</div>
<divclass="row">
<divclass="panel-body">
<divclass="col-md-12">
<divclass="row">
<divclass="col-md-8">
<divclass="param"style="padding: 5px 0 5px 0">
<labelfor="male"data-toggle="tooltip"data-placement="right"title="Tip: density in the high-dimensional space taken from the t-SNE itself.">Density</label>
<labelfor="male"data-toggle="tooltip"data-placement="right"title="Tip: remaining cost of each point throughout the entire projection.">Remaining cost</label>
<labelid="selectionLabel"style="margin-top:4px; margin-left: 15px"data-toggle="tooltip"data-placement="right"title="Tip: change between size/radius and color encodings.">Size-encoding</label>
</div>
<divclass="param"style="padding: 20px 0 5px 0; margin-top: 5px;">
<labelfor="male"data-toggle="tooltip"data-placement="right"title="Tip: adapt the selection of points in the two-dimensional space: from a simple distance measurement between point and line to KNN algorithm, and vice versa.">Correlation measurement</label>
<labelfor="param-corr"id="param-corrLabel"data-toggle="tooltip"data-placement="right"title="Tip: percentage of all points taken into account by Dimension Correlation.">Correlation threshold (%)</label>
<labelfor="param-lim"data-toggle="tooltip"data-placement="right"title="Tip: x*times the actual radius (increase/decrease points radius).">Points radius scaling</label>
<h2class="panel-title"data-toggle="tooltip"data-placement="right"title="Tip: various functionalities depending on the user. These modes enable different interactions in the main visualization view.">Interaction Modes (M)</h2>
</div>
<divclass="panel-body"id="resetAllFilters">
<tableclass="table table-borderless centerTable">
<tbody>
<tr>
<tdscope="row"><buttonclass="btn btn-info active"onclick="setLayerProj();"style="margin-left: -1px !important"><iclass="fas fa-mouse-pointer fa-lg"data-toggle="tooltip"data-placement="right"title="Tip: in this mode the user can zoom in and out in the main visualization view and when hovering on a particular point he/she receives the exact data set's instance dimensions."></i>Points exploration</button></td>
<td><buttonclass="btn btn-info"onclick="setLayerComp();"style="margin-left: -1.4px"><iclass="far fa-object-group fa-lg"data-toggle="tooltip"data-placement="right"title="Tip: lasso selection in the main visualization view."></i>Group selection</button></td>
</tr>
<tr>
<tdscope="row"><buttonclass="btn btn-info"onclick="setLayerSche();"style="margin-left: -2px !important"><iclass="fas fa-draw-polygon fa-lg"data-toggle="tooltip"data-placement="right"title="Tip: draw a shape (polylines) and check the related dimensions correlations for your drawing/shape. With the left click you set one point and the right click you confirm the drawing for further analysis."></i>Dimension correl.</button></td>
<td><buttonclass="btn btn-info"onclick="setReset();"><iclass="fas fa-trash-alt fa-lg"data-toggle="tooltip"data-placement="right"title="Tip: reset all filters applied in the visualizations without losing the execution."></i>Reset all filters</button></td>
<h2class="panel-title"data-toggle="tooltip"data-placement="right"style="display:inline-block"title="Tip: a view related to the overall quality of the projection.">Shepard Heatmap</h2>
<h2class="panel-title"data-toggle="tooltip"data-placement="right"style="display:inline-block"title="Tip: a view related to the overall quality of the projection. If the points/values belong to the diagonal, then the distances are preserved in both spaces. If values are closer to N-D distances, then the visualization is too compressed. If values are closer to 2-D distances, then the visualization is too spread out.">Shepard Heatmap</h2>
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: user-driven shape investigation of the most correlated dimensions.">Dimension Correlation</h2><divclass="param"style="display:inline-block; margin-top:-5px; float:right"><labelfor="param-corlim"style="display:inline-block; float: right"data-toggle="tooltip"data-placement="right"title="Tip: the minimum acceptable visible correlation. Default is 0, so the tool accepts all the correlations.">Min. Visible Correlation: #<outputfor="param-corlim"id="param-corlim-value"style="display:inline-block; float:right">0.0</output></label>
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: user-driven shape investigation of the most correlated dimensions.">Dimension Correlation</h2>
<labelfor="param-corlim"style="display:inline-block; float: right"data-toggle="tooltip"data-placement="right"title="Tip: the minimum acceptable visible correlation. Default is 0, so the tool accepts all the correlations.">Min Correl.: #<outputfor="param-corlim"id="param-corlim-value"title="Tip: minimum visible correlation (range: 0.0 to 1.0)."style="display:inline-block; float:right">0.0</output></label>
<h2class="panel-title"style="display:inline-block;"data-toggle="tooltip"data-placement="right"title="Tip: it might be useful to take a look at this histogram, to observe the density and remaining cost distributions, when remaining cost values are low and have an idea about the distributions.">Density and Remaining Cost Distributions</h2>
<labelfor="param-costlim"style="display:inline-block; float: right"data-toggle="tooltip"data-placement="right"title="Tip: set the rate of the limiter for the minimum acceptable visible cost at the main visualization view.">Min. Visible Cost Rate: #<outputfor="param-costlim"id="param-costlim-value"style="display:inline-block; float:right">1</output></label>
<h2class="panel-title"style="display:inline-block;"data-toggle="tooltip"data-placement="right"title="Tip: the density and remaining cost distributions are important to look at along with the main visualization view individual values.">Density and Remaining Cost</h2>
<labelfor="param-costlim"style="display:inline-block; float: right"data-toggle="tooltip"data-placement="right"title="Tip: set the rate of the limiter for the minimum acceptable visible cost at the main visualization view.">Min Cost: #<outputfor="param-costlim"id="param-costlim-value"title="Tip: minimum visible cost rate (range: 0.1 to 1.0)."style="display:inline-block; float:right">1.0</output></label>
<h2class="panel-title"style="display:inline-block"data-toggle="tooltip"data-placement="right"title="Tip: a feature of this tool that supports clusters (and points) exploration. Checking the neighborhood preservation between the entire projection's average and a selection driven by the user.">Neighborhood Preservation </h2>
<h2class="panel-title"data-toggle="tooltip"data-placement="right"title="Tip: for every selection the tool runs a local Principal Component Analysis (PCA) algorithm and dynamically adapts and shows the top 8 dimensions in an order from left to right. This sorting from left to right presents the most related (with high variance) features of the data set to the least important (low variance).">Adaptive Parallel Coordinates Plot</h2>
<h2class="panel-title"data-toggle="tooltip"data-placement="right"title="Tip: for every selection the tool runs a local Principal Component Analysis (PCA) algorithm and dynamically adapts and shows the top 8 dimensions in an order from left to right. This sorting from left to right presents the most related (with high variance) features of the data set to the least important (low variance). It also works with local selections of points!">Adaptive Parallel Coordinates Plot</h2>
// Category = the name of the category if it exists. The user has to add an asterisk ("*") mark in order to let the program identify this feature as a label/category name.
// ColorsCategorical = the categorical colors (maximum value = 10).
@ -53,7 +92,13 @@ var ParametersSet = []; var overallCost; var input;
@ -3469,11 +3530,11 @@ function OverviewtSNE(points){ // The overview t-SNE function
}
}
}
$("#datasetDetails").html("(Number of Dimensions: "+(Object.keys(dataFeatures[0]).length-valCategExists)+", Number of Instances: "+final_dataset.length+")");// Print on the screen the number of features and instances of the data set, which is being analyzed.
$("#datasetDetails").html("(Num. of Dim.: "+(Object.keys(dataFeatures[0]).length-valCategExists)+", Num. of Ins.: "+final_dataset.length+")");// Print on the screen the number of features and instances of the data set, which is being analyzed.
if(Category==undefined){
$("#CategoryName").html("Classification label: No category");// Print on the screen the classification label.
$("#CategoryName").html("Target label: N/A");// Print on the screen the classification label.
}else{
$("#CategoryName").html("Classification label: "+Category.replace('*',''));// Print on the screen the classification label.
$("#CategoryName").html("Target label: "+Category.replace('*',''));// Print on the screen the classification label.
}
//Make an SVG Container
@ -3510,7 +3571,7 @@ if (format[0] == "diabetes"){
// CREATE THE SVG
varsvg=d3.select('#overviewRect').append('svg')
.attr('width',dim)
.attr('height',dim)
.attr('height',dimh)
.append('g');
// CREATE THE GROUP
@ -3581,7 +3642,7 @@ var theRect = theGroup.append('rect')