The “K Closest Points to Origin” problem asks us to find the k points nearest to the origin (0,0) from a given list of 2D points. The distance is measured using the Euclidean formula, but since we only care about comparing distances (not actual values), we can safely use the squared distance formula to avoid unnecessary square root calculations.
This type of problem is common in computational geometry, spatial indexing, and real-world systems like nearest-neighbor searches in maps or recommendation engines.
To solve the problem efficiently, we use a max heap to keep track of the k closest points. As we iterate through each point, we calculate its squared distance from the origin. We maintain a heap of size at most k containing the closest points seen so far.
To simulate a max heap using Python’s built-in heapq
module (which is a min heap), we store the negative of the squared distance. If the heap grows beyond size k, we remove the point with the largest distance. At the end, the heap will contain exactly the k points closest to the origin.
The standard Euclidean distance formula is √(x² + y²)
. However, since square roots are monotonic, the relative order of distances is preserved even without computing the square root. That’s why we use the x² + y²
value instead — it's simpler and faster.
The time complexity of this heap-based approach is O(n log k), where n is the number of points and k is the number of closest points we want. Each insertion into the heap takes O(log k), and we do this for all points.
The space complexity is O(k) for the heap that stores the closest points.
An alternate solution is to sort the list of points by their squared distances and return the first k. This takes O(n log n) time due to the sorting step, and O(n) space if not sorting in place. While this approach is simpler, the heap-based solution is more efficient when k is much smaller than n.
The “K Closest Points to Origin” problem demonstrates the power of heaps in reducing the overhead of sorting when only a subset of values is needed. Using a max heap in combination with squared Euclidean distance gives us a clean, efficient solution suitable for large-scale datasets and performance-sensitive applications.