The brute-force solution uses a heap to maintain the k most frequent elements, leading to:
The optimal solution uses bucket sort to achieve O(n) time complexity:
In this problem, we're given an unsorted array of integers and asked to return the k elements that appear most frequently. This problem is a classic example of frequency analysis and appears frequently in scenarios such as log data analysis, recommendation systems, and natural language processing where you need to extract high-occurrence entities.
A common brute-force approach involves counting the frequency of each element and using a min heap to keep track of the top k most frequent numbers. We build a frequency map using a hash table (or Python’s Counter
) and then push each (frequency, number) pair into the heap. When the heap size exceeds k, we remove the smallest frequency element. This ensures the heap always contains the k most frequent elements seen so far.
Once all elements are processed, the numbers in the heap are our result. The heap allows for efficient retrieval of the most frequent items in O(n log k) time.
While the heap-based approach is efficient, we can do better. The maximum frequency any number can have is equal to the length of the input array. So instead of sorting or using a heap, we can use a bucket sort idea. We create an array of n + 1
buckets where index i
represents all numbers that appear exactly i
times.
After populating the buckets, we iterate from the highest possible frequency down to 1, collecting numbers until we have gathered k of them. This approach skips sorting altogether and guarantees a linear time solution in O(n).
The heap solution has a time complexity of O(n log k), where n is the number of unique elements and k is the number of top elements to retrieve. It uses O(k) space for the heap and O(n) for the frequency map.
The bucket sort solution improves the time complexity to O(n) in the average case by eliminating the log factor introduced by the heap. The space complexity remains O(n) due to the use of the frequency map and buckets.
The “Top K Frequent Elements” problem showcases the trade-offs between general-purpose data structures like heaps and specialized techniques like bucket sort. The heap approach is versatile and works well in many scenarios, but if the data’s frequency bounds are known or small, bucket sort offers superior performance. Understanding both strategies enhances your problem-solving toolkit for real-world frequency analysis problems.