Product Documentation

Select a Product:

Find Text:
Submit a question:

Not seeing an answer to a question you might have? You can ask it here and we'll try to update the documentation to address it. If you're making a feature request or reporting a problem, please create a support item for it, instead!


Auto-Cluster

Purpose

The Auto-cluster visual allows you to quickly identify possible relationships and similarity in your semantic model data, without having to resort to using R or Python: it's largely drag-drop-and-click! It uses an algorithm based on "Partition Around Medoid" (PAM), with extensions allowing one to weight features independently and search for parameters that yield more effective clustering. This is considered an example of "unsupervised machine learning." You can learn more by reading about "k-medoids", "PAM", and "clustering".

Data Bucket

The data requirements for the visual include 3 bucket fields, listed here:

Field Type Description
Feature Source Number, Date, Text Features are measurable properties that you wish to use in finding clusters. Pick values which you feel may be related.
Primary Measure Source Number, Date This can be a single scalar value that is used to size the inner-most circles, in the circle pack display style. If omitted, the size will be determined based on the "distance from medoid" for each group.
Predefined Categories Text If provided, applies groupings which manifest as nested circles in the circle packing style.

Settings

Clustering Settings
Field Default Description
Enable Clustering On When disabled, clustering is not performed and the visual reverts to simple circle packing of predefined categories only. (in 1.1)
Cluster Naming Build from Medoids Controls how clusters are named.
Cluster Name Separator ; When cluster names are automatically constructed, serves as the separator between terms. Can use \\n for a new line.
Cluster Name Prefix Becomes an optional prefix for generated cluster names. For example the prefix of 'Similar to' could generate a cluster name of 'Similar to ABC' if ABC were emitted as the cluster name based on the 'Cluster Naming' rule setting.
Cluster Name Mapping An optional way to set the name of clusters. Provide one mapping per line, of the format 'find|target' (or 'find,target') where 'find' is text that is checked against the generated cluster name (or failing that, data within the cluster) and if there is a case-insensitive match, the cluster name becomes 'target'. Note: whole-word matching is used.
Ordinal Identification For features that are categorical and represent ordinals, one can specify the specific ordering for the ordinals. The format is to have one line per ordinal feature, of the form 'featurename|val1|val2|...'. Feature name should match what shows in the data well, and values are weighted based on the sequence. For example 'Rating|Strongly Disagree|Disagree|Neutral|Agree|Strongly Agree' would apply '0' for 'Strongly Disagree', '0.25' for 'Disagree', and so on, for the 'Rating' feature (assuming normalized).
Aggregation Mode Take First If the source data includes multiple rows per lowest-level grouping, controls how those rows are handled for output.
Row ID Prefix When the Aggregation Mode is 'Add Row ID', sequential numbering is used to assign row identifiers, ensuring uniqueness of source data rows. This setting allows for a prefix such as 'Row' to be appended to the numbering.
Solution Searching None Controls whether solution searching is enabled. If not, you can provide an explicit 'k' (cluster count), or a heuristic rule will pick a number of clusters based on your row count.
Solution Searching Button None Controls whether a solution searcher button will be available for end-users to perform searching on-demand.
Timeout (seconds) 20 The number of seconds that the solution finder can run for before it times out. Increasing this number means it can run for longer to find possibly better solutions.
Searching: Iterations By default, the solution searcher determines the number of iterations it uses based on your row count, but you can override that value here. Providing a small value increases performance at the expense of the solution quality.
Cluster Count You can explicitly provide the number of clusters you want to use. If using the solution finder, 'k' can be searched for, otherwise the count will be inferred using a simple rule based on your row count. A value less than '2' implies no value is set for this.
Maximum Cluster Count 20 When solution searching is enabled, this value serves as the maximum number of clusters that can be included in a solution.
Cluster Position 1 Controls at what level the inferred cluster is shown. '1' places it as the first-level grouping and successive numbers nest it deeper. A number higher than the available number of groupings makes it the deepest grouping possible.
Default Weighting Leave Raw Controls how each feature is weighted, with respect to other features. This allows some distances to be emphasized and others deemphasized. 'Normalize' places all values on a [0,1] scale; 'Raw' uses numbers exactly as they are in the source data; 'Manual' requires use of the 'Manual Weights' setting.
Manual Weighting When the weighting rule is set to 'Manual Override', this should be a comma-separated list of weighting factors (typically between 0 and 1) that should apply to each of the Feature Source columns, in order.
Maximum Rows 20000 This is a global limit on the number of rows allowed to process. This helps ensure performance does not accidentally degrade, although you can change this to higher (or lower) values as required (at your own risk).
Allow Timeout Warning On When enabled, if the solution finder times out, a warning will be shown. When disabled, no warning is shown and the best solution available will be shown.
Case Must Match On When disabled, case is not required to match (allowing for the possibility of merging values).
Keep Whitespace Off When enabled, whitespace must match whitespace between compared values, for merge purposes (regardless of 'Ignore Non-alphanumeric' setting).
Diacritic Neutral Off When enabled, internally compares diacritic (accented) characters as if ASCII characters, for merge purposes.
Allow Tooltips On When enabled, tool-tips may be shown when hovering over data elements.

Circle Packing
Field Default Description
Keep Square Layout Off When enabled, the layout will be treated as a square even if the visual is laid out as a rectangle; in a practical sense this means you may see a vertical scroll bar to move up and down through the layout, but there also can be less horizontal whitespace.
Text Font The font family for text labels.
Text Size The size of the font for text labels.
Text Color The text color for text labels.
Level 1: Color Like Values Off When enabled, level 1 grouping will color circles based on like values.
Level 2: Color Like Values Off When enabled, level 2 grouping will color circles based on like values.
Level 3: Color Like Values Off When enabled, level 3 grouping will color circles based on like values.
Start Color The color used for the background. It is combined with the ending color with successive levels of grouping moving closer to the ending color. If omitted, becomes a lighter version of the ending color.
End Color The innermost level of grouping is influenced by this color. If omitted, becomes a darker version of the starting color.
Value Background The background color for value (smallest) circles.
Medoid Background The background color for value circles, which are also medoids (the 'central' item identified, per cluster). Overrides Value Background color. When omitted, medoids are not colored differently.
Quick Find Highlight The fill color to use for circles where there is a quick find match. (in 1.1)
Show Zoom In/Out Off When enabled, zoom in and out buttons/links are shown to support zooming within the currently displayed layout level.
Alt Click to Zoom (vs Shift) On When enabled, zooming to clicked areas is accomplished by holding down the Alt key and left-clicking the mouse. This is the default behavior, allowing other click types to be their default Power BI behavior, to support cross-filtering. Disabling this option makes Shift be used for Zoom click operations.
Label Line Length 60 The general maximum number of characters per label line.

Color Table
Field Default Description
Sort Criteria Primary Measure Average / Alphanumeric Controls how sorting is applied to the color table. If no primary measure is provided, that sort is ignored.
Sort Ascending On When enabled, the Sort Criteria is applied in ascending order - otherwise descending order is used.
Show Values All Features Controls which values are shown, per row.
Show Captions On When enabled and showing feature values, captions will be added for each feature, repeated once for each main grouping level.
Max Column Width Normally the visual tries to calculate a column width based on rules, but you can override this percentage setting here.
Cluster Name Font The font family for cluster names.
Cluster Name Text Size The size of the cluster name text.
Cluster Name Color The text color for the cluster name text.
Group Value Font The font family for grouping values.
Group Value Text Size The size of the group value text.
Grouping Value Text Color The text color for the grouping value text.
Feature Value Font The font family for feature values.
Feature Value Text Size The size of the feature values text.
Feature Value Text Color The text color for the feature values text.
Primary Measure Font The font family for the primary measure, as shown in the feature values list.
Primary Measure Text Size The size of the primary measure text, as shown in the feature values list.
Primary Measure Text Color The text color for the primary measure text, as shown in the features values list.
Feature Caption Font The font family for the feature caption (if enabled), as shown above the feature values list.
Feature Caption Text Size The size of the feature caption text (if enabled), as shown above the feature values list.
Feature Caption Text Color The text color for the feature caption text (if enabled), as shown above the features values list.
Medoid Font The font family for the feature values, specifically when the row in question is for the medoid data point.
Medoid Text Size The size of the feature values, specifically when the row in question is for the medoid data point.
Medoid Text Color The text color for the feature values, specifically when the row in question is for the medoid data point.
Medoid Background Color The background color for the feature values, specifically when the row in question is for the medoid data point.

General Formatting
Field Default Description
Display Style Circle Packing Identifies the primary display style for the visual.
Summary Stats Hide Controls if/where summary stats are shown, on the layout surface. This is most useful when dealing with non-standard weightings, determined through the solution finder.
Quick Find Hide Controls if/where the quick find text box is shown, on the layout surface. (in 1.1)
Quick Find Minimum Length 2 The minimum number of characters needed in the quick find before a match is attempted. (in 1.1)
Show Links Off When disabled, buttons are used instead of links for various commands.
Show Copy Clipboard Off When disabled, the copy data to clipboard link will be hidden. Data is copied in the format defined by the ** setting. Data includes the inferred cluster, which is NOT available in the source data. Note: copy functionality varies based on browser used.
Copied Message Duration 4 The number of seconds to display the 'copied data to clipboard' message. Use zero to show no message.
Show Counts Off When enabled, the number of rows loaded is shown at the bottom of the visual.
Show Tips On When enabled, tips are shown below the main display area. Note: this can only be disabled when NOT inside Power BI Desktop OR when using an organization license key.
Export: Cluster Name ClusterName When 'copying to clipboard', the column header value for the calculated name for the cluster. A blank value will omit the field.
Export: Row ID Name RowID When 'copying to clipboard', the column header value for the row ID which is assigned as data is loaded into the visual. A blank value will omit the field.
Export: Is Medoid IsMedoid When 'copying to clipboard', the column header value for the indicator of whether a given row is for a medoid. A blank value will omit the field.
Export: Distance from Medoid MedoidDistance When 'copying to clipboard', the column header value for the calculated distance for the given row, from the medoid of the cluster. A blank value will omit the field.
Export: Weights Suffix When 'copying to clipboard', additional columns can be added for each feature, with the column value being the feature's weight. The column name for weights will be the feature name, followed by this setting. A blank value will not include weights.
Export: Normal Suffix When 'copying to clipboard', additional columns can be added for each feature, with the column value being the feature's normalized value. The column name for the normalized values will be the feature name, followed by this setting. A blank value will not include normalized values.
Counter Text Size The size of the counter text.
Counter Text Font The font family for the counter text.
Loaded Text loaded Text to show for the 'loaded' count text.

Site License and Diagnostics
Field Default Description
Licensed By Provide the name that the product was registered with, as found in the confirmation email.
Site License Key For site licensing (not from AppSource). Provide the 22 character license key, as found in the confirmation email. (Note: a reload of the report is needed after the key and licensee is set.)

Other Resources / Links:

AppSource Purchase Guide

Learn more about purchasing one or more user licenses from Microsoft AppSource.
This application may no longer respond until reloaded. Reload 🗙