alr.acquisition¶
Classses¶
AcquisitionFunction¶
-
class
alr.acquisition.AcquisitionFunction[source]¶ Bases:
abc.ABCA base class for all acquisition functions. All subclasses should override the __call__ method.
-
__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]¶ Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.
Parameters: - X_pool (torch.utils.data.Dataset) – Unlabelled dataset
- b (int) – number of points to acquire
Returns: array of indices to X_pool.
Return type: np.array
-
RandomAcquisition¶
-
class
alr.acquisition.RandomAcquisition[source]¶ Bases:
alr.acquisition.AcquisitionFunctionImplements random acquisition. Uniformly sample b indices.
-
__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]¶ Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.
Parameters: - X_pool (torch.utils.data.Dataset) – Unlabelled dataset
- b (int) – number of points to acquire
Returns: array of indices to X_pool.
Return type: np.array
-
BALD¶
-
class
alr.acquisition.BALD(pred_fn: Callable[[torch.Tensor], torch.Tensor], subset: Optional[int] = -1, device: Union[str, torch.device, None] = None, debug: Optional[bool] = False, **data_loader_params)[source]¶ Bases:
alr.acquisition.AcquisitionFunctionImplements BALD.
\[\begin{align} -\sum_c\left(\frac{1}{T}\sum_t\hat{p}^t_c \right) log \left( \frac{1}{T}\sum_t\hat{p}^t_c \right) + \frac{1}{T}\sum_{c,t}\hat{p}^t_c log \hat{p}^t_c \end{align}\]where \(\hat{p}^t_c\) is the softmax output of class \(c\) on the \(t^{th}\) stochastic iteration.
model = MCDropout(...) bald = BALD(eval_fwd_exp(model), subset=-1, device=device, batch_size=512, pin_memory=True, num_workers=2) bald(X_pool, b=10)
Parameters: - pred_fn (Callable) – A callable that returns a tensor of shape \(K \times N \times C\) where \(K\) is the number of inference samples, \(N\) is the number of instances, and \(C\) is the number of classes. This function should return probabilities, not *log* probabilities!
- subset (int, optional) – Size of the subset of X_pool. Use -1 to denote the entire pool.
- device (None, str, torch.device) – Move data to specified device when passing input data into pred_fn.
- debug (bool, optional) – Save additional information to recent_score (requires more space).
- data_loader_params – params to be passed into DataLoader when iterating over X_pool.
Warning
Do not set shuffle=True in data_loader_params! The indices will be incorrect if the DataLoader object shuffles X_pool!
-
__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]¶ Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.
Parameters: - X_pool (torch.utils.data.Dataset) – Unlabelled dataset
- b (int) – number of points to acquire
Returns: array of indices to X_pool.
Return type: np.array
ICAL¶
-
class
alr.acquisition.ICAL(pred_fn: Callable[[torch.Tensor], torch.Tensor], kernel_fn: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, subset: Optional[int] = 200, greedy_acquire: Optional[int] = 1, use_one_hot: Optional[bool] = True, sample_softmax: Optional[bool] = True, device: Union[str, torch.device, None] = None, **data_loader_params)[source]¶ Bases:
alr.acquisition.AcquisitionFunctionImplements ‘normal’ ICAL. \(R\) points are randomly drawn from the pool and the average of the candidate batch’s kernels is used instead. Thus, the dependency measure reduces to \(d = 2\).
\[\frac{1}{|\mathcal{R}|} d\text{HSIC}(\displaystyle\sum_{x'\in\mathcal{R}} k^{x'}, \frac{1}{B} \displaystyle\sum_{i = 1}^{B} k^{x_i})\]model = MCDropout(...) ical = ICAL(eval_fwd_exp(model), device=device, batch_size=512, pin_memory=True, num_workers=2) ical(X_pool, b=10)
Parameters: - pred_fn (Callable) – A callable that returns a tensor of shape \(K \times N \times C\) where \(K\) is the number of inference samples, \(N\) is the number of instances, and \(C\) is the number of classes. This function should return probabilities, not *log* probabilities!
- kernel_fn (Callable[[torch.Tensor], torch.Tensor]], optional) – Kernel function, see static methods of
ICAL. Defaults to weighted a rational quadratic kernel. This is the default kernel in the paper. - subset (int, optional) – Normal ICAL uses a subset of X_pool. subset specifies the size of this subset (\(|\mathcal{R}|\) in the paper). Use -1 to denote the entire pool.
- greedy_acquire (int, optional) – how many points to acquire at once in each acquisition step.
- use_one_hot (bool, optional) – use one_hot_encoding when calculating kernel matrix. This is the default behaviour in the paper.
- sample_softmax (bool, optional) – sample the softmax probabilities. If this is True, then use_one_hot is automatically overriden to be True. This is the default behaviour in the paper.
- device (None, str, torch.device) – Move data to specified device when passing input data into pred_fn.
- data_loader_params – params to be passed into DataLoader when iterating over X_pool.
Warning
Do not set shuffle=True in data_loader_params! The indices will be incorrect if the DataLoader object shuffles X_pool!
-
__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]¶ Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.
Parameters: - X_pool (torch.utils.data.Dataset) – Unlabelled dataset
- b (int) – number of points to acquire
Returns: array of indices to X_pool.
Return type: np.array
BatchBALD¶
-
class
alr.acquisition.BatchBALD(pred_fn: Callable[[torch.Tensor], torch.Tensor], device: Union[str, torch.device, None] = None, num_samples: int = 10000, **data_loader_params)[source]¶ Bases:
alr.acquisition.AcquisitionFunction-
__call__(X_pool: torch.utils.data.dataset.Dataset, b: int) → numpy.array[source]¶ Given unlabelled data pool X_pool, return the best b points for labelling by an oracle, where the best points are determined by this acquisition function and its parameters.
Parameters: - X_pool (torch.utils.data.Dataset) – Unlabelled dataset
- b (int) – number of points to acquire
Returns: array of indices to X_pool.
Return type: np.array
-