Heap/Quickselect| Finds the k-th smallest/largest element(s) in the list

2024. 10. 26. 04:38

https://leetcode.com/problems/kth-largest-element-in-an-array/?envType=company&envId=facebook&favoriteSlug=facebook-thirty-days

"To approach this problem, I like to first consider the brute-force method because it helps clarify how the solution works and when it might become inefficient based on the input size.

For instance, let's say the problem is to find the 'k' largest elements in an array. If the input size 'n' is relatively small—let's say around 100 elements or fewer—I might start by sorting the array. Sorting takes O(n log n) time, so with 100 elements, that would be approximately 700 operations, which is manageable. Once sorted, I can simply take the last 'k' elements, which gives the correct answer with minimal complexity.

However, if the input size grows significantly—say, 1 million elements—sorting the entire array now takes about 20 million operations (O(n log n) with n = 1,000,000). At this scale, sorting is inefficient, especially if we only need a few elements, like the top 10. In such cases, it’s better to optimize the solution.

That's where heaps come into play. I would use a min-heap to store only 'k' elements. As I traverse the array, I compare each element to the smallest element in the heap (in O(1) time) and, if it's larger, I replace it (in O(log k) time). By the end of the traversal, the heap contains the 'k' largest elements. The time complexity for this approach is O(n log k), which, for large inputs, drastically reduces the number of operations. For example, with 1 million elements and 'k' equal to 10, this approach would only take about 140,000 operations, which is much more efficient than 20 million.

In summary:

If the input size is small, I start with a brute-force approach because sorting is simple and sufficient for manageable input sizes.
But if the input size is large, like in the range of hundreds of thousands or millions, I switch to a heap-based approach, which efficiently handles the problem in logarithmic time relative to 'k'."

You:
"After initially considering a brute-force solution, depending on the input size, I would next evaluate the more optimal approaches. The heap approach is one of the most commonly used because it allows us to efficiently maintain the 'k' largest or smallest elements.

For example, if we’re looking for the 'k' largest elements, I would use a min-heap of size 'k'. As I traverse the array, I compare each element to the root of the heap:

If the element is larger than the heap’s root (the smallest element in the heap), I replace the root with the new element and re-heapify. This takes O(log k) time for each operation, resulting in a total time complexity of O(n log k) for 'n' elements. This is efficient when 'k' is small compared to 'n', and avoids the inefficiencies of sorting the entire array.

However, there are also other approaches depending on the problem constraints:

Quickselect Algorithm
If I need the exact 'k-th' largest element, and I don’t necessarily need the elements in sorted order, I might consider the quickselect algorithm. Quickselect is a partition-based algorithm, similar to quicksort, but it only partially sorts the array around the 'k-th' element. This method works in O(n) on average and is efficient for finding one element (or a range) without fully sorting the array. It’s particularly useful when we’re concerned only with the 'k-th' element, rather than all the 'k' largest.
Bucket Sort
For specific scenarios where the input range is limited (e.g., when elements are integers within a known small range), I could use bucket sort. This approach involves dividing the array into buckets and distributing the elements into these buckets based on their values. After filling the buckets, we can gather the top 'k' elements by traversing the highest buckets. This method can achieve O(n) time if the range of values is small relative to 'n'. It’s a great option when the problem constraints allow, especially if I know the data distribution is skewed or limited.

To summarize:

I start with a brute-force approach for small input sizes or as a baseline.
I then move to a heap-based solution for larger input sizes where I need efficient handling of the 'k' largest or smallest elements.
If I’m only interested in the 'k-th' element, I might use quickselect for a more direct approach with average O(n) complexity.
In specific cases, like when the value range is known and limited, bucket sort might be the best option, offering linear time complexity for certain distributions."

You:
"Quickselect is an efficient algorithm for finding the 'k-th' smallest or largest element in an unsorted list. It's based on the same idea as quicksort, but instead of sorting the entire list, we focus only on the part of the list that contains the element we're looking for. This reduces the average time complexity to O(n), making it much faster than sorting, which takes O(n log n).

Let me walk through the approach step-by-step:

Initial Setup
We begin by picking a pivot element from the list. This can be any element, but in practice, choosing the middle element or a random element often works well to avoid worst-case scenarios.
Partitioning
Similar to quicksort, we partition the array around the pivot:
- Elements smaller than the pivot are placed on the left.
- Elements greater than the pivot are placed on the right.
The key here is that, after partitioning, the pivot is in its correct position relative to the rest of the array—everything on the left is smaller, and everything on the right is larger.
Decide Which Half to Search
After partitioning, we check the position of the pivot:
- If the pivot is in the 'k-th' position (for 'k-th' smallest), we have found our element, and we can return it.
- If the pivot’s position is greater than 'k', we know the 'k-th' smallest element lies in the left subarray, so we recursively apply the same logic to that subarray.
- If the pivot’s position is less than 'k', we search the right subarray for the 'k-th' smallest element.
This recursive process continues until we find the desired element.
Time Complexity
In the average case, each partition step reduces the problem size by half, resulting in a time complexity of O(n). However, in the worst case (if the pivot is poorly chosen), the algorithm can degrade to O(n^2), similar to quicksort. But with a good pivot selection strategy (like choosing a random pivot), we can generally avoid this scenario.

Example:
Let’s say we’re looking for the 3rd smallest element in the array [7, 2, 1, 6, 8, 5, 3, 4]. Using quickselect:

Pick a pivot—let's choose 4.
Partition the array: [2, 1, 3] | 4 | [7, 6, 8, 5]. Now 4 is in its correct position (index 3).
Since we’re looking for the 3rd smallest element, which is still in the left partition [2, 1, 3], we recursively apply quickselect to that subarray.
Eventually, we find that 3 is the 3rd smallest element.

To summarize:

Quickselect is highly efficient for finding the 'k-th' smallest or largest element with an average time complexity of O(n).
Instead of sorting the entire array, it partitions the array recursively, focusing only on the necessary part.
It's a great alternative when we only need a specific element, rather than sorting everything."

import random

def quickselect(arr, k):
    """
    Finds the k-th smallest element in the list `arr`.

    Parameters:
    - arr (list of int): The list of integers.
    - k (int): The index (1-based) of the element to find in sorted order.

    Returns:
    - int: The k-th smallest element.
    """
    # Convert k to 0-based index for simplicity
    k = k - 1

    def partition(left, right, pivot_index):
        pivot_value = arr[pivot_index]
        # Move pivot to the end
        arr[pivot_index], arr[right] = arr[right], arr[pivot_index]
        store_index = left
        # Move all smaller elements to the left
        for i in range(left, right):
            if arr[i] < pivot_value:
                arr[i], arr[store_index] = arr[store_index], arr[i]
                store_index += 1
        # Move pivot to its final place
        arr[right], arr[store_index] = arr[store_index], arr[right]
        return store_index

    def select(left, right):
        # Base case: the list contains only one element
        if left == right:
            return arr[left]

        # Select a random pivot index
        pivot_index = random.randint(left, right)

        # Perform partitioning
        pivot_index = partition(left, right, pivot_index)

        # The pivot is in its final sorted position
        if pivot_index == k:
            return arr[pivot_index]
        elif pivot_index < k:
            # Go right
            return select(pivot_index + 1, right)
        else:
            # Go left
            return select(left, pivot_index - 1)

    return select(0, len(arr) - 1)

Example Usage:

arr = [7, 2, 1, 6, 8, 5, 3, 4]
k = 3
print(f"The {k}-th smallest element is {quickselect(arr, k)}.")

Explanation:

partition: This function arranges elements around a pivot so that all smaller elements are to the left and all larger elements are to the right.
select: This recursive function narrows down the search based on the position of the pivot.
Time Complexity: On average, Quickselect takes O(n) time due to halving the search space with each partition.

'ML Engineering > python' 카테고리의 다른 글

Heap/Quickselect \| Top K Frequent Elements (1)	2024.10.26
Heap/Quickselect \| K Closest Points to Origin (0)	2024.10.26
08. Math (0)	2024.08.07
07. Subset Techniques (0)	2024.08.06
06. Top K Elements Technique (0)	2024.08.06

Notes