pyml.neighbors.knn.kNNClassifier#

class kNNClassifier(k=3, metric='euclidean')[source]#

Bases: object

Classifier model using the nearest neighbor algorithm

K-nearest neighbor (KNN) is a simple and intuitive machine learning algorithm, that can be used for classification and regression tasks. In the case of classification the model predicts the class of an data point based on the majority class or average of its K nearest data points in the feature space.

Following metrics are support: - euclidean - manhatten

Parameters:
  • k (int, optional) – Specifies the number of nearest neighbor to consider when predicting on new data. By default 3.

  • metric (str, optional) – Specifies the metric used for calculating the distance By default ‘euclidean’.

Variables:

metrics (List[str]) – Defines the metrics that are currently supported

Raises:
  • UnknownMetric – Raised when using an unknow metric name (including spelling errors)

  • ShapeError – Raised when computing the distance for incompatible matrices

Methods

__init__

fit

Fit model on training data

predict

Calculates predictions for given data points

Attributes

metrics

_compute_distance(x1, x2)[source]#

Computes the distance between two matrix-like objects using the defined metric

One of the parameters must be a matrix with only one row or alternativly a vector.

Return type:

array

Parameters:
  • x1 (numpy.ndarray) – Input matrix

  • x2 (numpy.ndarray) – Input matrix

Returns:

Matrix consisting of the distances

Return type:

numpy.ndarray

Raises:

ShapeError – If shapes do not match a shape error

See also

pyml.exceptions.ShapeError

fit(X, y)[source]#

Fit model on training data

Since the k nearest neighbor algorithm is a lazy learner, there will be no training. However, the training data will be stored in memory.

Return type:

None

Parameters:
  • X (numpy.ndarray) – Input training data

  • y (numpy.array) – Input training labels

predict(X, return_class_prob=False)[source]#

Calculates predictions for given data points

Return type:

array

Parameters:
  • X (numpy.ndarray) – Input matrix; for each row the k nearest neighbor is being calculated

  • return_class_prob (bool, optional) – If set to true, the respective probability of each prediction is be returned as well (#predicted_class / k). By default False.

Returns:

Returns predicted labels and if specified their respective probability.

Return type:

numpy.ndarray