# The selection/subset framework¶

One of the central concepts in Glue is that of subsets, which are typically created as a result of the user selecting data in a viewer or creating the subset from the command-line. In order to go from a selection on the screen to defining a subset from a dataset, Glue includes the following concepts:

**Region of interests**(ROIs), which are an abstract representation of a geometrical region or selection.**Subset states**, which is a descriptions of the subset selection.- Data
**Subsets**, which are the result of applying a subset state/selection to a specific dataset.

When a user makes a selection in a data viewer in the Glue application, the selection is first translated into a ROI, after which the ROI is converted to a subset state, then applied to the data collection to produce subsets in each dataset. These three concepts are described in more detail below.

## Regions of interest¶

The easiest way to think of regions of interest is as geometrical regions.
Basic classes for common types of ROIs are included in the `glue.core.roi`

sub-module. For example, the `RectangularROI`

class
describes a rectangular region using the lower and upper values in two
dimensions:

```
>>> from glue.core.roi import RectangularROI
>>> roi = RectangularROI(xmin=1, xmax=3, ymin=2, ymax=5)
```

Note that this is not related to any particular dataset – it is an abstract
representation of a rectangular region. It also doesn’t specify which
components the rectangle is drawn in. All ROIs have a
`glue.core.roi.RectangularROI.contains()`

method that can be used to check
if a point or a set of points lies inside the region:

```
>>> roi.contains(0, 3)
False
>>> roi.contains(2, 3)
True
>>> import numpy as np
>>> x = np.array([0, 2, 4])
>>> y = np.array([3, 3, 2])
>>> roi.contains(x, y)
array([False, True, False], dtype=bool)
```

## Subset states¶

While regions of interest define geometrical regions, subset states, which are
sub-classes of `SubsetState`

, describe a selection as
a function of Glue `ComponentID`

objects. Note
that this is different from `Subset`

instances, which
describe the subset *resulting* from the selection (see Subsets). The
following simple example shows how to easily create a
`SubsetState`

:

```
>>> from glue.core import Data
>>> data = Data(x=[1,2,3], y=[2,3,4])
>>> state = data.id['x'] > 1.5
>>> state
<InequalitySubsetState: (x > 1.5)>
```

Note that `state`

is not the subset of values in `data`

that are greater
than 1.5 – instead, it is a representation of the inequality, the *concept* of
selecting all values of x greater than 1.5. This distinction is important,
because if another dataset defines a link between one of its components and the
`x`

component of `data`

, then the inequality can be used for that other
component too.

While the above syntax is convenient for using Glue via the command-line, in
the case of data viewers, we actually want to translate ROIs into subset
states. To do this, the `Component`

class includes
a `subset_from_roi()`

method that takes a
ROI and returns a subset state. At the moment this method works for 1- and 2-d
ROIs. In the case of 2-d ROIs, the method should be given a reference to the
second `Component`

. In more complex cases, you can
also define your own logic for converting ROIs into subset states. See the
documentation of `subset_from_roi()`

for
more details.

Subset states can be combined using logical operations:

```
>>> state1 = data.id['x'] > 1.5
>>> state2 = data.id['y'] < 4
>>> state1 & state2
<glue.core.subset.AndState at 0x10ebd0160>
>>> state1 | state2
<glue.core.subset.OrState at 0x10ebd00f0>
>>> ~state1
<glue.core.subset.InvertState at 0x10ebd03c8>
```

Note that you should use `&`

, `|`

, and `~`

as opposed to `and`

, `or`

,
and `not`

.

## Subsets¶

A subset is what we normally think of as sub-part of a dataset. Subsets are
typically created by making Subset states first. There are then different
ways of applying this subset state to a `Data`

object to actually create a subset. The
easiest way of doing this is to simply call the
`new_subset()`

method with the
`SubsetState`

and optionally a label describing that
subset:

```
>>> subset = data.new_subset(state, label='x > 1.5')
>>> subset
Subset: x > 1.5 (data: )
```

The resulting subset can then be used in a similar way to a
`Data`

object, but it will return only the values in the
subset:

```
>>> subset['x']
array([2, 3])
>>> subset['y']
array([3, 4])
```

Finally, you can also get the mask from a subset:

```
>>> subset.to_mask()
array([False, True, True], dtype=bool)
```

One of the benefits of subset states is that they can be applied to multiple
data objects, and if the different data objects have linked components (as described in The linking framework), this
may produce several valid subsets in different datasets. We can apply a `SubsetState`

to all datasets in a data collection by using the `new_subset_group()`

method with
the `SubsetState`

and a label describing that subset, similarly to `new_subset()`

```
>>> from glue.core import DataCollection
>>> data_collection = DataCollection([data])
>>> subset_group = data_collection.new_subset_group('x > 1.5', state)
```

This creates a `SubsetGroup`

which represents a group of subsets, with the individual subsets accessible via the `subsets`

attribute:

```
>>> subset = subset_group.subsets[0]
>>> subset
Subset: x > 1.5 (data: )
```