alr.acquisition

Classses

AcquisitionFunction

class alr.acquisition.AcquisitionFunction[source]

Bases: abc.ABC

A base class for all acquisition functions. All subclasses should override the __call__ method.

__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]

Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.

Parameters:
  • X_pool (torch.utils.data.Dataset) – Unlabelled dataset
  • b (int) – number of points to acquire
Returns:

array of indices to X_pool.

Return type:

np.array

RandomAcquisition

class alr.acquisition.RandomAcquisition[source]

Bases: alr.acquisition.AcquisitionFunction

Implements random acquisition. Uniformly sample b indices.

__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]

Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.

Parameters:
  • X_pool (torch.utils.data.Dataset) – Unlabelled dataset
  • b (int) – number of points to acquire
Returns:

array of indices to X_pool.

Return type:

np.array

BALD

class alr.acquisition.BALD(pred_fn: Callable[[torch.Tensor], torch.Tensor], subset: Optional[int] = -1, device: Union[str, torch.device, None] = None, debug: Optional[bool] = False, **data_loader_params)[source]

Bases: alr.acquisition.AcquisitionFunction

Implements BALD.

\[\begin{align} -\sum_c\left(\frac{1}{T}\sum_t\hat{p}^t_c \right) log \left( \frac{1}{T}\sum_t\hat{p}^t_c \right) + \frac{1}{T}\sum_{c,t}\hat{p}^t_c log \hat{p}^t_c \end{align}\]

where \(\hat{p}^t_c\) is the softmax output of class \(c\) on the \(t^{th}\) stochastic iteration.

model = MCDropout(...)
bald = BALD(eval_fwd_exp(model), subset=-1, device=device,
            batch_size=512, pin_memory=True,
            num_workers=2)
bald(X_pool, b=10)
Parameters:
  • pred_fn (Callable) – A callable that returns a tensor of shape \(K \times N \times C\) where \(K\) is the number of inference samples, \(N\) is the number of instances, and \(C\) is the number of classes. This function should return probabilities, not *log* probabilities!
  • subset (int, optional) – Size of the subset of X_pool. Use -1 to denote the entire pool.
  • device (None, str, torch.device) – Move data to specified device when passing input data into pred_fn.
  • debug (bool, optional) – Save additional information to recent_score (requires more space).
  • data_loader_params – params to be passed into DataLoader when iterating over X_pool.

Warning

Do not set shuffle=True in data_loader_params! The indices will be incorrect if the DataLoader object shuffles X_pool!

__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]

Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.

Parameters:
  • X_pool (torch.utils.data.Dataset) – Unlabelled dataset
  • b (int) – number of points to acquire
Returns:

array of indices to X_pool.

Return type:

np.array

ICAL

class alr.acquisition.ICAL(pred_fn: Callable[[torch.Tensor], torch.Tensor], kernel_fn: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, subset: Optional[int] = 200, greedy_acquire: Optional[int] = 1, use_one_hot: Optional[bool] = True, sample_softmax: Optional[bool] = True, device: Union[str, torch.device, None] = None, **data_loader_params)[source]

Bases: alr.acquisition.AcquisitionFunction

Implements ‘normal’ ICAL. \(R\) points are randomly drawn from the pool and the average of the candidate batch’s kernels is used instead. Thus, the dependency measure reduces to \(d = 2\).

\[\frac{1}{|\mathcal{R}|} d\text{HSIC}(\displaystyle\sum_{x'\in\mathcal{R}} k^{x'}, \frac{1}{B} \displaystyle\sum_{i = 1}^{B} k^{x_i})\]
model = MCDropout(...)
ical = ICAL(eval_fwd_exp(model), device=device,
            batch_size=512,
            pin_memory=True, num_workers=2)
ical(X_pool, b=10)
Parameters:
  • pred_fn (Callable) – A callable that returns a tensor of shape \(K \times N \times C\) where \(K\) is the number of inference samples, \(N\) is the number of instances, and \(C\) is the number of classes. This function should return probabilities, not *log* probabilities!
  • kernel_fn (Callable[[torch.Tensor], torch.Tensor]], optional) – Kernel function, see static methods of ICAL. Defaults to weighted a rational quadratic kernel. This is the default kernel in the paper.
  • subset (int, optional) – Normal ICAL uses a subset of X_pool. subset specifies the size of this subset (\(|\mathcal{R}|\) in the paper). Use -1 to denote the entire pool.
  • greedy_acquire (int, optional) – how many points to acquire at once in each acquisition step.
  • use_one_hot (bool, optional) – use one_hot_encoding when calculating kernel matrix. This is the default behaviour in the paper.
  • sample_softmax (bool, optional) – sample the softmax probabilities. If this is True, then use_one_hot is automatically overriden to be True. This is the default behaviour in the paper.
  • device (None, str, torch.device) – Move data to specified device when passing input data into pred_fn.
  • data_loader_params – params to be passed into DataLoader when iterating over X_pool.

Warning

Do not set shuffle=True in data_loader_params! The indices will be incorrect if the DataLoader object shuffles X_pool!

__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]

Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.

Parameters:
  • X_pool (torch.utils.data.Dataset) – Unlabelled dataset
  • b (int) – number of points to acquire
Returns:

array of indices to X_pool.

Return type:

np.array

static rational_quadratic(alphas: Optional[Sequence[float]] = (0.2, 0.5, 1, 2, 5), weights: Optional[Sequence[float]] = None) → Callable[source]

BatchBALD

class alr.acquisition.BatchBALD(pred_fn: Callable[[torch.Tensor], torch.Tensor], device: Union[str, torch.device, None] = None, num_samples: int = 10000, **data_loader_params)[source]

Bases: alr.acquisition.AcquisitionFunction

__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]

Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.

Parameters:
  • X_pool (torch.utils.data.Dataset) – Unlabelled dataset
  • b (int) – number of points to acquire
Returns:

array of indices to X_pool.

Return type:

np.array

Functions