# The selection/subset framework¶

One of the central concepts in Glue is that of subsets, which are typically created as a result of the user selecting data in a viewer or creating the subset from the command-line. In order to go from a selection on the screen to defining a subset from a dataset, Glue includes the following concepts:

**Region of interests**(ROIs), which are an abstract representation of a geometrical region or selection.**Subset states**, which is a descriptions of the subset selection.- Data
**Subsets**, which are the result of applying a subset state/selection to a specific dataset.

When a user makes a selection in a data viewer in the Glue application, the selection is first translated into a ROI, after which the ROI is converted to a subset state, then applied to the data collection to produce subsets in each dataset. These three concepts are described in more detail below.

## Regions of interest¶

The easiest way to think of regions of interest is as geometrical regions.
Basic classes for common types of ROIs are included in the `glue.core.roi`

sub-module. For example, the `RectangularROI`

class
describes a rectangular region using the lower and upper values in two
dimensions:

```
>>> from glue.core.roi import RectangularROI
>>> roi = RectangularROI(xmin=1, xmax=3, ymin=2, ymax=5)
```

Note that this is not related to any particular dataset – it is an abstract
representation of a rectangular region. It also doesn’t specify which
components the rectangle is drawn in. All ROIs have a
`glue.core.roi.RectangularROI.contains()`

method that can be used to check
if a point or a set of points lies inside the region:

```
>>> roi.contains(0, 3)
False
>>> roi.contains(2, 3)
True
>>> import numpy as np
>>> x = np.array([0, 2, 4])
>>> y = np.array([3, 3, 2])
>>> roi.contains(x, y)
array([False, True, False], dtype=bool)
```

## Subset states¶

While regions of interest define geometrical regions, subset states, which are
sub-classes of `SubsetState`

, describe a selection as
a function of Glue `ComponentID`

objects. Note
that this is different from `Subset`

instances, which
describe the subset *resulting* from the selection (see Subsets). The
following simple example shows how to easily create a
`SubsetState`

:

```
>>> from glue.core import Data
>>> data = Data(x=[1,2,3], y=[2,3,4])
>>> state = data.id['x'] > 1.5
>>> state
<InequalitySubsetState: (x > 1.5)>
```

Note that `state`

is not the subset of values in `data`

that are greater
than 1.5 – instead, it is a representation of the inequality, the *concept* of
selecting all values of x greater than 1.5. This distinction is important,
because if another dataset defines a link between one of its components and the
`x`

component of `data`

, then the inequality can be used for that other
component too.

While the above syntax is convenient for using Glue via the command-line, in
the case of data viewers, we actually want to translate ROIs into subset
states. To do this, the `Component`

class includes
a `subset_from_roi()`

method that takes a
ROI and returns a subset state. At the moment this method works for 1- and 2-d
ROIs. In the case of 2-d ROIs, the method should be given a reference to the
second `Component`

. In more complex cases, you can
also define your own logic for converting ROIs into subset states. See the
documentation of `subset_from_roi()`

for
more details.

Subset states can be combined using logical operations:

```
>>> state1 = data.id['x'] > 1.5
>>> state2 = data.id['y'] < 4
>>> state1 & state2
<glue.core.subset.AndState at 0x10ebd0160>
>>> state1 | state2
<glue.core.subset.OrState at 0x10ebd00f0>
>>> ~state1
<glue.core.subset.InvertState at 0x10ebd03c8>
```

Note that you should use `&`

, `|`

, and `~`

as opposed to `and`

, `or`

,
and `not`

.

## Subsets¶

A subset is what we normally think of as sub-part of a dataset. Subsets are
typically created by making Subset states first. There are then different
ways of applying this subset state to a `Data`

object to actually create a subset. The
easiest way of doing this is to simply call the
`new_subset()`

method with the
`SubsetState`

and optionally a label describing that
subset:

```
>>> subset = data.new_subset(state, label='x > 1.5')
>>> subset
Subset: x > 1.5 (data: )
```

The resulting subset can then be used in a similar way to a
`Data`

object, but it will return only the values in the
subset:

```
>>> subset['x']
array([2, 3])
>>> subset['y']
array([3, 4])
```

Finally, you can also get the mask from a subset:

```
>>> subset.to_mask()
array([False, True, True], dtype=bool)
```

One of the benefits of subset states is that they can be applied to multiple
data objects, and if the different data objects have linked components (as described in *The linking framework*), this
may produce several valid subsets in different datasets. We can apply a `SubsetState`

to all datasets in a data collection by using the `new_subset_group()`

method with
the `SubsetState`

and a label describing that subset, similarly to `new_subset()`

```
>>> from glue.core import DataCollection
>>> data_collection = DataCollection([data])
>>> subset_group = data_collection.new_subset_group('x > 1.5', state)
```

This creates a `SubsetGroup`

which represents a group of subsets, with the individual subsets accessible via the `subsets`

attribute:

```
>>> subset = subset_group.subsets[0]
>>> subset
Subset: x > 1.5 (data: )
```