https://leetcode.com/problems/top-k-frequent-elements/description/?envType=company&envId=facebook&favoriteSlug=facebook-thirty-days

 

Here’s an updated structured script that includes the Quickselect approach for solving the Top K Frequent Elements problem, along with sorting, heap, and bucket sort methods.


You:
"To solve the problem of finding the k most frequent elements in a list of integers, there are a few different approaches we can take, depending on the input size and the desired level of efficiency. I’ll walk through each approach, from simplest to most optimized, including their pros and cons.

1. Sorting Solution

The simplest approach is to use sorting:

  • Steps: First, count the frequency of each element using a dictionary or Python's Counter class. Then, convert the frequency dictionary to a list of (element, frequency) tuples and sort this list by frequency in descending order. Finally, select the first k elements from the sorted list.
  • Time Complexity: O(n log n), due to sorting the entire list of elements by frequency.
  • Space Complexity: O(n) for the frequency dictionary and sorted list.
  • Pros:
    • Straightforward and easy to implement.
    • Suitable for small to moderate input sizes.
  • Cons:
    • Sorting is inefficient for large lists when we only need the top k elements. Sorting all elements doesn’t leverage the partial results we need.

2. Heap Solution (Efficient for Larger Lists with Small k)

A more efficient approach for larger inputs is to use a min-heap:

  • Steps: After creating a frequency dictionary, we use a min-heap to keep track of the k most frequent elements. For each element and its frequency, we push it into the heap. If the heap size exceeds k, we remove the element with the smallest frequency, ensuring that only the k most frequent elements remain.
  • Time Complexity: O(n log k), where we add n elements to a heap of size k.
  • Space Complexity: O(n) for the dictionary and O(k) for the heap.
  • Pros:
    • Efficient for large inputs where k is small relative to n.
    • Uses less space by storing only k elements in the heap.
  • Cons:
    • More complex to implement due to heap operations.

3. Bucket Sort Solution (Optimal for Frequency-Based Grouping)

An even more efficient approach in terms of time complexity is bucket sort:

  • Steps: After building the frequency dictionary, we create an array of buckets where each index represents a frequency count. Each bucket stores elements that appear that many times. Finally, we collect the top k elements by iterating through the buckets from highest to lowest frequency.
  • Time Complexity: O(n), as we only count elements and place them into frequency-based buckets.
  • Space Complexity: O(n) for the dictionary and buckets.
  • Pros:
    • Highly efficient for large inputs and avoids sorting or heap maintenance.
    • Works well for situations where k is close to n.
  • Cons:
    • Bucket sort can be less intuitive to implement, and requires extra space for the buckets.

4. Quickselect Solution (Optimal for Top-k Selection)

Another highly efficient solution, especially for very large lists, is Quickselect:

  • Steps: Quickselect is a partition-based algorithm similar to quicksort. After building the frequency dictionary, we convert it into a list of (element, frequency) pairs and use Quickselect to partially sort the list such that the k most frequent elements are positioned in the first k spots. We partition the list until the k-th most frequent element is in the correct position, and return the first k elements.
  • Time Complexity: O(n) on average, as Quickselect only partially sorts the list to find the top k elements.
  • Space Complexity: O(n) for the dictionary and list.
  • Pros:
    • Very efficient with an average-case complexity of O(n), especially for very large lists.
    • Avoids sorting the entire list, which makes it faster than the sorting approach.
  • Cons:
    • The worst-case complexity is O(n^2), though using random pivot selection helps mitigate this risk.
    • Quickselect is more complex to implement and understand compared to other solutions.

Summary:

  • Sorting: Simple but less efficient for large inputs, with O(n log n) complexity.
  • Heap: Ideal for large lists when k is much smaller than n, with O(n log k) complexity.
  • Bucket Sort: Optimized for large lists and frequency-based grouping, with O(n) complexity, though it requires additional space.
  • Quickselect: Offers the best average-case efficiency with O(n) time complexity, ideal for very large lists and when k is close to n.

Each solution has its trade-offs, so I’d choose the approach based on input size and constraints. For large lists with small k, the heap or Quickselect approach would be optimal, while for lists where k is close to n, bucket sort may be best."


This script provides a structured breakdown of each solution, explaining when each approach is optimal based on the constraints, making it easy to decide the best solution.


Problem Statement Recap:
You are given a list of integers and an integer k. The goal is to return the k most frequent elements in the list.


1. Sorting Solution

Explanation:

  • Start by counting the frequency of each element in the list. We can use Python's Counter from the collections module to achieve this.
  • Once we have the frequency of each element, we convert the frequency dictionary into a list of tuples, where each tuple is (element, frequency).
  • Sort this list of tuples in descending order based on frequency.
  • Finally, select the first k elements from this sorted list.

Implementation:

from collections import Counter
from typing import List

def top_k_frequent_sort(nums: List[int], k: int) -> List[int]:
    # Step 1: Count frequencies
    freq_count = Counter(nums)
    # Step 2: Sort items by frequency in descending order
    sorted_items = sorted(freq_count.items(), key=lambda item: item[1], reverse=True)
    # Step 3: Extract the first k elements
    return [item[0] for item in sorted_items[:k]]

# Example usage
nums = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_sort(nums, k))  # Output: [1, 2]

Time Complexity: O(n log n)

  • Counting frequencies takes O(n), while sorting the items by frequency takes O(n log n).

Space Complexity: O(n) for storing the frequency dictionary and sorted list.

Pros:

  • Simple and straightforward to implement.
  • Effective for small to medium inputs.

Cons:

  • Sorting the entire frequency list is unnecessary when we only need the top k elements, making it less efficient for large inputs.

2. Heap Solution (Optimal for Large Lists with Small k)

Explanation:

  • After counting the frequency of each element, we use a min-heap of size k to keep track of the k most frequent elements.
  • We push each element along with its frequency into the heap.
    • If the heap exceeds size k, we remove the element with the smallest frequency (root of the min-heap).
  • By the end, the heap contains only the k most frequent elements.

Implementation:

import heapq
from collections import Counter
from typing import List

def top_k_frequent_heap(nums: List[int], k: int) -> List[int]:
    # Step 1: Count frequencies
    freq_count = Counter(nums)
    # Step 2: Use a min-heap of size k
    min_heap = []
    for num, freq in freq_count.items():
        heapq.heappush(min_heap, (freq, num))
        if len(min_heap) > k:
            heapq.heappop(min_heap)
    # Step 3: Extract elements from the heap
    return [num for (freq, num) in min_heap]

# Example usage
nums = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_heap(nums, k))  # Output: [2, 1]

Time Complexity: O(n log k)

  • Counting frequencies takes O(n), and maintaining a heap of size k takes O(log k) time for each of the n elements.

Space Complexity: O(n) for the frequency dictionary and O(k) for the heap.

Pros:

  • Efficient for large inputs when k is much smaller than n.
  • Keeps track of only k elements, optimizing space and time usage.

Cons:

  • Slightly more complex to implement due to heap management.

3. Bucket Sort Solution (Optimal for Frequency-Based Grouping)

Explanation:

  • This approach leverages the fact that the frequency of elements is bounded by the length of the list (n), as the maximum frequency an element can have is n.
  • Create an array of n+1 empty buckets (index 0 to n). Each bucket at index i holds a list of elements that appear i times.
  • Place each element from the frequency dictionary into its corresponding bucket based on frequency.
  • Finally, iterate through the buckets in reverse order, collecting elements until we have k elements.

Implementation:

from collections import Counter
from typing import List

def top_k_frequent_bucket_sort(nums: List[int], k: int) -> List[int]:
    # Step 1: Count frequencies
    freq_count = Counter(nums)
    # Step 2: Initialize buckets where index is frequency
    buckets = [[] for _ in range(len(nums) + 1)]
    for num, freq in freq_count.items():
        buckets[freq].append(num)
    # Step 3: Gather top k elements from the buckets
    result = []
    for i in range(len(buckets) - 1, 0, -1):
        for num in buckets[i]:
            result.append(num)
            if len(result) == k:
                return result

# Example usage
nums = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_bucket_sort(nums, k))  # Output: [1, 2]

Time Complexity: O(n)

  • Counting frequencies takes O(n), and placing elements into buckets also takes O(n).

Space Complexity: O(n) for the frequency dictionary and buckets.

Pros:

  • Very efficient for problems where n is large and k is close to n.
  • No sorting or heap maintenance required, and it handles frequencies directly.

Cons:

  • Bucket sort can be less intuitive to implement.
  • Requires extra space for buckets, which may not be ideal for space-constrained environments.

Certainly! Here’s the Quickselect implementation for solving the Top K Frequent Elements problem.

4. Quickselect Solution

Explanation:

  • We start by building a frequency dictionary to count the occurrences of each element.
  • Then, we convert this dictionary into a list of tuples (element, frequency).
  • Using Quickselect, we partition the list of tuples so that the k most frequent elements are positioned in the first k spots in the list.
  • After partitioning, we return the elements from the first k positions.

Implementation:

import random
from collections import Counter
from typing import List, Tuple

def top_k_frequent_quickselect(nums: List[int], k: int) -> List[int]:
    # Step 1: Count frequencies
    freq_count = Counter(nums)
    # Convert the dictionary to a list of (element, frequency) pairs
    freq_items = list(freq_count.items())

    def partition(left: int, right: int, pivot_index: int) -> int:
        pivot_frequency = freq_items[pivot_index][1]
        # Move pivot to the end
        freq_items[pivot_index], freq_items[right] = freq_items[right], freq_items[pivot_index]
        store_index = left
        # Move all elements with frequency greater than pivot to the left
        for i in range(left, right):
            if freq_items[i][1] > pivot_frequency:
                freq_items[store_index], freq_items[i] = freq_items[i], freq_items[store_index]
                store_index += 1
        # Move pivot to its final place
        freq_items[right], freq_items[store_index] = freq_items[store_index], freq_items[right]
        return store_index

    def quickselect(left: int, right: int, k_smallest: int):
        if left == right:  # If the list contains only one element
            return
        # Select a random pivot index
        pivot_index = random.randint(left, right)
        # Partition the array around the pivot
        pivot_index = partition(left, right, pivot_index)
        # Recursively apply quickselect on the relevant half
        if k_smallest == pivot_index:
            return
        elif k_smallest < pivot_index:
            quickselect(left, pivot_index - 1, k_smallest)
        else:
            quickselect(pivot_index + 1, right, k_smallest)

    # Perform Quickselect for the k most frequent elements
    n = len(freq_items)
    quickselect(0, n - 1, k - 1)

    # Return the first k elements' values from freq_items
    return [item[0] for item in freq_items[:k]]

# Example usage
nums = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_quickselect(nums, k))  # Output: [1, 2]

Explanation of Key Steps:

  1. Partition Function:
    • Selects a pivot and rearranges elements such that all elements with frequency higher than the pivot are on the left, and all elements with frequency lower than the pivot are on the right.
    • This allows us to position elements based on their frequency.
  2. Quickselect Function:
    • Partitions the list around a pivot index until the k-th most frequent element is in the correct position.
    • This process allows us to get the top k frequent elements in average O(n) time without fully sorting the list.

Pros and Cons:

  • Pros: Efficient with an average time complexity of O(n), ideal for large lists.
  • Cons: The worst-case time complexity is O(n^2), though random pivot selection mitigates this in practice.

Summary

  • Sorting Solution: Simple but inefficient for large n, with O(n log n) complexity.
  • Heap Solution: Ideal for large n with small k, with O(n log k) complexity.
  • Bucket Sort Solution: Optimal for large n and frequency-based grouping, with O(n) complexity, but uses more space.

Here's a structured script for explaining how to solve the K Closest Points to Origin problem with various approaches, including brute-force, heap, and optimized techniques, along with pros and cons:


You:
"To solve the problem of finding the k closest points to the origin in a list of points, we can explore multiple approaches depending on the input size and desired efficiency. The goal is to calculate the Euclidean distance for each point from the origin (0, 0) and return the k points with the smallest distances.

  1. Brute-Force Approach (Sorting)
    The simplest approach is to calculate the distance of each point from the origin, store these distances in a list, and then sort the list based on these distances. We can then select the first k points from this sorted list.
    • Calculate the Euclidean distance squared for each point to avoid using sqrt (not necessary for comparing distances).
    • Sort the list of points based on the calculated distances.
    • Return the first k points from the sorted list.
    Pros:
    • Straightforward and easy to implement.
    Cons:
    • Sorting the list takes O(n log n) time, which can be inefficient for large n if only k smallest distances are required.
    • This approach doesn’t leverage the fact that we only need the closest k points, not the full sorted list.
  2. Steps:
  3. Heap Approach (Optimized for Efficiency)
    To improve efficiency, we can use a max-heap to store the k closest points while traversing the points list:
    • We first add the first k points to the max-heap based on their distances to the origin.
    • For each additional point, calculate its distance and compare it with the root of the max-heap (the largest distance among the k closest points so far). If the new point’s distance is smaller, we remove the root and insert the new point into the heap.
    • This ensures that we only keep k points in the heap at any time.
    Pros:
    • O(n log k) time complexity, which is more efficient than sorting when k is much smaller than n.
    • The heap helps us efficiently maintain the k closest points without sorting the entire list.
    Cons:
    • Implementing and managing a max-heap might add complexity, but Python’s heapq library can simplify this.
  4. Quickselect Algorithm (Alternative Optimal Solution)
    Another approach is to use the Quickselect algorithm, which is similar to quicksort but only partially sorts the array to find the k smallest elements:
    • Select a pivot point and partition the array around this pivot based on distance.
    • After partitioning, if the pivot position is k, then the first k points in the list are the closest.
    • Otherwise, recursively apply quickselect on the relevant half of the array until the first k points are found.
    Pros:
    • O(n) average time complexity, which is faster than heap-based solutions for large inputs.
    • No additional memory overhead since quickselect works in-place.
    Cons:
    • The worst-case time complexity is O(n^2), though choosing a random pivot or median-of-medians strategy can mitigate this.
    • It’s more complex to implement than the heap approach.
  5. Alternative Approach (Using Sorted Containers for Smaller Inputs)
    For scenarios where the input list of points is relatively small, using Python’s built-in sorted function or even a data structure like a SortedList (from sortedcontainers library) can simplify implementation.
    • Sort the points based on the distance in O(n log n) time.
    • Return the first k points.
    This is typically less efficient than heap-based solutions for large n but can be simpler for small datasets.

To summarize:

  • For a basic solution, sorting works fine but is not efficient for large n.
  • For larger inputs, the max-heap approach is optimal for maintaining the closest k points in O(n log k) time.
  • For very large inputs, Quickselect provides an efficient in-place solution with average O(n) time complexity.

Given these options, I would choose the max-heap approach for balancing efficiency and ease of implementation, or Quickselect for the absolute best average time complexity."


Here are Python implementations for each solution to the K Closest Points to Origin problem: sorting, heap, and quickselect.

1. Sorting Solution

This approach calculates the distances and sorts the points by distance.

from typing import List

def k_closest_sort(points: List[List[int]], k: int) -> List[List[int]]:
    # Sort points by distance (no need for sqrt as we only need relative distance)
    points.sort(key=lambda point: point[0]**2 + point[1]**2)
    # Return the first k points
    return points[:k]

# Example usage
points = [[1, 3], [-2, 2], [5, 8], [0, 1]]
k = 2
print(k_closest_sort(points, k))  # Output: k closest points

2. Heap Solution

This approach uses a max-heap to keep track of the k closest points.

import heapq
from typing import List

def k_closest_heap(points: List[List[int]], k: int) -> List[List[int]]:
    # Initialize a max-heap with the first k points (negative distance for max-heap behavior)
    max_heap = [(-point[0]**2 - point[1]**2, point) for point in points[:k]]
    heapq.heapify(max_heap)

    # Iterate over the remaining points
    for point in points[k:]:
        distance = -point[0]**2 - point[1]**2  # Negative for max-heap
        # If current point is closer than the farthest in the heap
        if distance > max_heap[0][0]:
            heapq.heappop(max_heap)
            heapq.heappush(max_heap, (distance, point))

    # Extract the points from the heap
    return [point for (_, point) in max_heap]

# Example usage
points = [[1, 3], [-2, 2], [5, 8], [0, 1]]
k = 2
print(k_closest_heap(points, k))  # Output: k closest points

3. Quickselect Solution

This approach uses Quickselect to partition the points so that the k closest points are in the first k positions of the list.

import random
from typing import List

def k_closest_quickselect(points: List[List[int]], k: int) -> List[List[int]]:
    def distance(point):
        return point[0]**2 + point[1]**2

    def partition(left, right, pivot_index):
        pivot_distance = distance(points[pivot_index])
        # Move pivot to the end
        points[pivot_index], points[right] = points[right], points[pivot_index]
        store_index = left
        # Move all points closer than pivot to the left
        for i in range(left, right):
            if distance(points[i]) < pivot_distance:
                points[store_index], points[i] = points[i], points[store_index]
                store_index += 1
        # Move pivot to its final place
        points[right], points[store_index] = points[store_index], points[right]
        return store_index

    def quickselect(left, right, k_smallest):
        if left == right:  # If the list contains only one element
            return
        # Select a random pivot index
        pivot_index = random.randint(left, right)
        # Partition the array around the pivot
        pivot_index = partition(left, right, pivot_index)
        # Recursively apply the quickselect
        if k_smallest == pivot_index:
            return
        elif k_smallest < pivot_index:
            quickselect(left, pivot_index - 1, k_smallest)
        else:
            quickselect(pivot_index + 1, right, k_smallest)

    # Perform quickselect for the k closest points
    quickselect(0, len(points) - 1, k)
    # Return the first k elements
    return points[:k]

# Example usage
points = [[1, 3], [-2, 2], [5, 8], [0, 1]]
k = 2
print(k_closest_quickselect(points, k))  # Output: k closest points

Summary:

  • Sorting Solution: Simple but not optimal for large lists (O(n log n)).
  • Heap Solution: Efficient for large lists when k is small (O(n log k)).
  • Quickselect Solution: Most efficient average time complexity for large lists (O(n) on average).

Certainly! Here’s an explanation for each solution to the K Closest Points to Origin problem:


1. Sorting Solution

Explanation:

  • In this approach, we start by calculating the Euclidean distance squared for each point from the origin. We square the distances (point[0]**2 + point[1]**2) instead of taking the square root to avoid unnecessary computations and because the relative order of distances remains unchanged without square roots.
  • We sort the list of points based on these squared distances in ascending order.
  • Finally, we return the first k points from the sorted list as they represent the closest points.

Time Complexity: O(n log n)

  • Sorting the list of n points is the main operation and takes O(n log n) time.

Space Complexity: O(1) (assuming in-place sorting)

Pros:

  • Simple to implement and straightforward to understand.
  • Works well for relatively small input sizes.

Cons:

  • Sorting the entire list of points is inefficient for large inputs when we only need k closest points.
  • Doesn’t take advantage of the fact that we only need a partial result (k points) and could be optimized further for larger inputs.

2. Heap Solution (Optimal for Larger Inputs)

Explanation:

  • This approach uses a max-heap to store the k closest points. A max-heap allows us to efficiently maintain the k smallest distances at any given time.
  • We start by adding the first k points from points into the max-heap. We store each point in the heap with its negative squared distance, as Python’s heapq module only supports a min-heap, and we need a max-heap behavior to keep the farthest element at the root.
  • For each remaining point, we calculate its squared distance. If this new distance is smaller than the largest distance in the max-heap (i.e., the current farthest point), we replace the root with the new point, keeping only the k closest points in the heap.
  • After processing all points, the heap contains exactly the k closest points.

Time Complexity: O(n log k)

  • Constructing and maintaining a max-heap with only k elements takes O(log k) time per insertion.
  • For n points, this results in O(n log k) time, which is much more efficient than sorting for large n when k is small.

Space Complexity: O(k)

  • The heap stores only k points at any time, which is efficient in terms of space.

Pros:

  • Efficient for large inputs where k is much smaller than n, as we don’t need to sort all points.
  • Uses constant space for the heap, making it memory-efficient.

Cons:

  • Slightly more complex to implement, particularly due to the max-heap setup using Python’s min-heap functionality.

3. Quickselect Solution (Optimal Average-Case Time Complexity)

Explanation:

  • This approach uses the Quickselect algorithm, which is similar to quicksort but focuses only on partitioning the list until we have the k smallest elements at the start of the list.
  • We first select a pivot point randomly and partition the points based on their distance from the origin. All points with distances less than the pivot’s distance are moved to the left, and all with greater distances to the right.
  • After partitioning, if the pivot position is k, the first k elements are the closest points, and we return them.
  • If the pivot position is less than k, we apply quickselect recursively to the right side, as we need more points to reach k. If the pivot position is greater than k, we apply quickselect on the left side.
  • This process continues until we find the first k closest points.

Time Complexity: O(n) on average

  • Each partition step reduces the number of elements we’re working with, giving an average-case time complexity of O(n).
  • However, in the worst case (e.g., when the pivot is always poorly chosen), this algorithm can degrade to O(n^2).

Space Complexity: O(1) for in-place operations, though the recursive calls can add up.

Pros:

  • The most efficient average-case solution for large inputs, particularly when k is close to n.
  • In-place operations mean no additional space is needed beyond the input list.

Cons:

  • The worst-case time complexity is O(n^2), though using a random pivot helps avoid this in practice.
  • More complex to implement than heap or sorting solutions.

Summary of Approach Suitability

  • Sorting: Best for smaller lists or if you need a simple solution without concern for optimal efficiency.
  • Heap: Ideal for large lists where k is small compared to n, balancing ease of implementation with optimal performance.
  • Quickselect: The most efficient solution for very large lists when you want the lowest average time complexity, but it requires careful handling to avoid worst-case scenarios.

Each approach provides a useful solution depending on the input size and constraints, showing flexibility in problem-solving.

'ML Engineering > python' 카테고리의 다른 글

Heap/Quickselect | Top K Frequent Elements  (1) 2024.10.26
Heap/Quickselect| Finds the k-th smallest/largest element(s) in the list  (0) 2024.10.26
08. Math  (0) 2024.08.07
07. Subset Techniques  (0) 2024.08.06
06. Top K Elements Technique  (0) 2024.08.06

https://leetcode.com/problems/kth-largest-element-in-an-array/?envType=company&envId=facebook&favoriteSlug=facebook-thirty-days

 

"To approach this problem, I like to first consider the brute-force method because it helps clarify how the solution works and when it might become inefficient based on the input size.

For instance, let's say the problem is to find the 'k' largest elements in an array. If the input size 'n' is relatively small—let's say around 100 elements or fewer—I might start by sorting the array. Sorting takes O(n log n) time, so with 100 elements, that would be approximately 700 operations, which is manageable. Once sorted, I can simply take the last 'k' elements, which gives the correct answer with minimal complexity.

However, if the input size grows significantly—say, 1 million elements—sorting the entire array now takes about 20 million operations (O(n log n) with n = 1,000,000). At this scale, sorting is inefficient, especially if we only need a few elements, like the top 10. In such cases, it’s better to optimize the solution.

That's where heaps come into play. I would use a min-heap to store only 'k' elements. As I traverse the array, I compare each element to the smallest element in the heap (in O(1) time) and, if it's larger, I replace it (in O(log k) time). By the end of the traversal, the heap contains the 'k' largest elements. The time complexity for this approach is O(n log k), which, for large inputs, drastically reduces the number of operations. For example, with 1 million elements and 'k' equal to 10, this approach would only take about 140,000 operations, which is much more efficient than 20 million.

In summary:

  1. If the input size is small, I start with a brute-force approach because sorting is simple and sufficient for manageable input sizes.
  2. But if the input size is large, like in the range of hundreds of thousands or millions, I switch to a heap-based approach, which efficiently handles the problem in logarithmic time relative to 'k'."

You:
"After initially considering a brute-force solution, depending on the input size, I would next evaluate the more optimal approaches. The heap approach is one of the most commonly used because it allows us to efficiently maintain the 'k' largest or smallest elements.

For example, if we’re looking for the 'k' largest elements, I would use a min-heap of size 'k'. As I traverse the array, I compare each element to the root of the heap:

  • If the element is larger than the heap’s root (the smallest element in the heap), I replace the root with the new element and re-heapify. This takes O(log k) time for each operation, resulting in a total time complexity of O(n log k) for 'n' elements. This is efficient when 'k' is small compared to 'n', and avoids the inefficiencies of sorting the entire array.

However, there are also other approaches depending on the problem constraints:

  1. Quickselect Algorithm
    If I need the exact 'k-th' largest element, and I don’t necessarily need the elements in sorted order, I might consider the quickselect algorithm. Quickselect is a partition-based algorithm, similar to quicksort, but it only partially sorts the array around the 'k-th' element. This method works in O(n) on average and is efficient for finding one element (or a range) without fully sorting the array. It’s particularly useful when we’re concerned only with the 'k-th' element, rather than all the 'k' largest.
  2. Bucket Sort
    For specific scenarios where the input range is limited (e.g., when elements are integers within a known small range), I could use bucket sort. This approach involves dividing the array into buckets and distributing the elements into these buckets based on their values. After filling the buckets, we can gather the top 'k' elements by traversing the highest buckets. This method can achieve O(n) time if the range of values is small relative to 'n'. It’s a great option when the problem constraints allow, especially if I know the data distribution is skewed or limited.

To summarize:

  • I start with a brute-force approach for small input sizes or as a baseline.
  • I then move to a heap-based solution for larger input sizes where I need efficient handling of the 'k' largest or smallest elements.
  • If I’m only interested in the 'k-th' element, I might use quickselect for a more direct approach with average O(n) complexity.
  • In specific cases, like when the value range is known and limited, bucket sort might be the best option, offering linear time complexity for certain distributions."

You:
"Quickselect is an efficient algorithm for finding the 'k-th' smallest or largest element in an unsorted list. It's based on the same idea as quicksort, but instead of sorting the entire list, we focus only on the part of the list that contains the element we're looking for. This reduces the average time complexity to O(n), making it much faster than sorting, which takes O(n log n).

Let me walk through the approach step-by-step:

  1. Initial Setup
    We begin by picking a pivot element from the list. This can be any element, but in practice, choosing the middle element or a random element often works well to avoid worst-case scenarios.
  2. Partitioning
    Similar to quicksort, we partition the array around the pivot:
    • Elements smaller than the pivot are placed on the left.
    • Elements greater than the pivot are placed on the right.
    The key here is that, after partitioning, the pivot is in its correct position relative to the rest of the array—everything on the left is smaller, and everything on the right is larger.
  3. Decide Which Half to Search
    After partitioning, we check the position of the pivot:
    • If the pivot is in the 'k-th' position (for 'k-th' smallest), we have found our element, and we can return it.
    • If the pivot’s position is greater than 'k', we know the 'k-th' smallest element lies in the left subarray, so we recursively apply the same logic to that subarray.
    • If the pivot’s position is less than 'k', we search the right subarray for the 'k-th' smallest element.
    This recursive process continues until we find the desired element.
  4. Time Complexity
    In the average case, each partition step reduces the problem size by half, resulting in a time complexity of O(n). However, in the worst case (if the pivot is poorly chosen), the algorithm can degrade to O(n^2), similar to quicksort. But with a good pivot selection strategy (like choosing a random pivot), we can generally avoid this scenario.

Example:
Let’s say we’re looking for the 3rd smallest element in the array [7, 2, 1, 6, 8, 5, 3, 4]. Using quickselect:

  1. Pick a pivot—let's choose 4.
  2. Partition the array: [2, 1, 3] | 4 | [7, 6, 8, 5]. Now 4 is in its correct position (index 3).
  3. Since we’re looking for the 3rd smallest element, which is still in the left partition [2, 1, 3], we recursively apply quickselect to that subarray.
  4. Eventually, we find that 3 is the 3rd smallest element.

To summarize:

  • Quickselect is highly efficient for finding the 'k-th' smallest or largest element with an average time complexity of O(n).
  • Instead of sorting the entire array, it partitions the array recursively, focusing only on the necessary part.
  • It's a great alternative when we only need a specific element, rather than sorting everything."

import random

def quickselect(arr, k):
    """
    Finds the k-th smallest element in the list `arr`.

    Parameters:
    - arr (list of int): The list of integers.
    - k (int): The index (1-based) of the element to find in sorted order.

    Returns:
    - int: The k-th smallest element.
    """
    # Convert k to 0-based index for simplicity
    k = k - 1

    def partition(left, right, pivot_index):
        pivot_value = arr[pivot_index]
        # Move pivot to the end
        arr[pivot_index], arr[right] = arr[right], arr[pivot_index]
        store_index = left
        # Move all smaller elements to the left
        for i in range(left, right):
            if arr[i] < pivot_value:
                arr[i], arr[store_index] = arr[store_index], arr[i]
                store_index += 1
        # Move pivot to its final place
        arr[right], arr[store_index] = arr[store_index], arr[right]
        return store_index

    def select(left, right):
        # Base case: the list contains only one element
        if left == right:
            return arr[left]

        # Select a random pivot index
        pivot_index = random.randint(left, right)

        # Perform partitioning
        pivot_index = partition(left, right, pivot_index)

        # The pivot is in its final sorted position
        if pivot_index == k:
            return arr[pivot_index]
        elif pivot_index < k:
            # Go right
            return select(pivot_index + 1, right)
        else:
            # Go left
            return select(left, pivot_index - 1)

    return select(0, len(arr) - 1)

Example Usage:

arr = [7, 2, 1, 6, 8, 5, 3, 4]
k = 3
print(f"The {k}-th smallest element is {quickselect(arr, k)}.")

Explanation:

  • partition: This function arranges elements around a pivot so that all smaller elements are to the left and all larger elements are to the right.
  • select: This recursive function narrows down the search based on the position of the pivot.
  • Time Complexity: On average, Quickselect takes O(n) time due to halving the search space with each partition.

'ML Engineering > python' 카테고리의 다른 글

Heap/Quickselect | Top K Frequent Elements  (1) 2024.10.26
Heap/Quickselect | K Closest Points to Origin  (0) 2024.10.26
08. Math  (0) 2024.08.07
07. Subset Techniques  (0) 2024.08.06
06. Top K Elements Technique  (0) 2024.08.06

Math problems on platforms like LeetCode often require a combination of mathematical insight, algorithmic thinking, and coding skills. These problems can range from simple arithmetic operations to complex mathematical theories. Here, I will explain some common types of math problems and the techniques used to solve them.

Common Types of Math Problems and Techniques

  1. Random Pick with Weight
  2. Basic Calculator II
  3. Pow(x, n)
  4. K Closest Points to Origin
  5. Continuous Subarray Sum
  6. Random Pick Index
  7. Maximum Swap
  8. Add Strings

528. Random Pick with Weight

Problem: Given an array w of positive integers where w[i] describes the weight of index i, write a function pickIndex which randomly picks an index in proportion to its weight.

Approach:

  • Use prefix sums and binary search.
  • Compute the prefix sum array and use binary search to pick an index based on a random number.

Code:

import random
import bisect

class Solution:
    def __init__(self, w):
        self.prefix_sums = []
        prefix_sum = 0
        for weight in w:
            prefix_sum += weight
            self.prefix_sums.append(prefix_sum)
        self.total_sum = prefix_sum

    def pickIndex(self):
        target = random.randint(1, self.total_sum)
        return bisect.bisect_left(self.prefix_sums, target)

# Example usage:
weights = [1, 3, 2]
obj = Solution(weights)
print(obj.pickIndex())  # Randomly returns 0, 1, or 2 based on weights

Explanation:

  1. Prefix Sums: Compute the prefix sum of weights.
  2. Binary Search: Use binary search to find the index corresponding to a random target within the total weight range.

227. Basic Calculator II

Problem: Implement a basic calculator to evaluate a simple expression string containing non-negative integers, +, -, *, and / operators.

Approach:

  • Use a stack to manage the numbers and operators.
  • Traverse the string and handle operations based on operator precedence.

Code:

def calculate(s):
    if not s:
        return 0

    stack = []
    num = 0
    sign = '+'
    s += '+'

    for c in s:
        if c.isdigit():
            num = num * 10 + int(c)
        elif c in '+-*/':
            if sign == '+':
                stack.append(num)
            elif sign == '-':
                stack.append(-num)
            elif sign == '*':
                stack.append(stack.pop() * num)
            elif sign == '/':
                stack.append(int(stack.pop() / num))
            sign = c
            num = 0

    return sum(stack)

# Example usage:
expression = "3+2*2"
print(calculate(expression))  # Output: 7

Explanation:

  1. Stack for Numbers: Use a stack to handle numbers and intermediate results.
  2. Operator Precedence: Handle * and / immediately, defer + and - until the end.

50. Pow(x, n)

Problem: Implement pow(x, n), which calculates x raised to the power n.

Approach:

  • Use recursion and the concept of exponentiation by squaring.

Code:

def my_pow(x, n):
    def helper(x, n):
        if n == 0:
            return 1
        half = helper(x, n // 2)
        if n % 2 == 0:
            return half * half
        else:
            return half * half * x

    if n < 0:
        x = 1 / x
        n = -n
    return helper(x, n)

# Example usage:
x = 2.0
n = 10
print(my_pow(x, n))  # Output: 1024.0

Explanation:

  1. Recursion: Break down the problem into smaller subproblems using recursion.
  2. Exponentiation by Squaring: Efficiently compute powers by squaring intermediate results.

973. K Closest Points to Origin

Problem: Given an array of points where points[i] = [xi, yi] represents a point on the XY plane, return the k closest points to the origin (0, 0).

Approach:

  • Use a max-heap to keep track of the k closest points.

Code:

import heapq

def k_closest(points, k):
    max_heap = []
    for x, y in points:
        dist = -(x**2 + y**2)
        if len(max_heap) == k:
            heapq.heappushpop(max_heap, (dist, x, y))
        else:
            heapq.heappush(max_heap, (dist, x, y))
    return [(x, y) for (dist, x, y) in max_heap]

# Example usage:
points = [[1, 3], [-2, 2]]
k = 1
print(k_closest(points, k))  # Output: [[-2, 2]]

Explanation:

  1. Max-Heap: Use a max-heap to keep the closest k points by distance.
  2. Distance Calculation: Calculate the Euclidean distance and maintain the closest points.

523. Continuous Subarray Sum

Problem: Given an integer array nums and an integer k, return true if nums has a continuous subarray of size at least two whose elements sum up to a multiple of k.

Approach:

  • Use a hash map to store the running sum modulo k.

Code:

def check_subarray_sum(nums, k):
    sum_map = {0: -1}
    running_sum = 0

    for i, num in enumerate(nums):
        running_sum += num
        if k != 0:
            running_sum %= k
        if running_sum in sum_map:
            if i - sum_map[running_sum] > 1:
                return True
        else:
            sum_map[running_sum] = i

    return False

# Example usage:
nums = [23, 2, 4, 6, 7]
k = 6
print(check_subarray_sum(nums, k))  # Output: True

Explanation:

  1. Running Sum: Calculate the running sum modulo k.
  2. Hash Map: Use a hash map to track the first occurrence of each remainder and check the distance between occurrences.

398. Random Pick Index

Problem: Given an integer array nums with possible duplicates, implement the Solution class:

  • pick(target): Randomly returns the index of the target number.

Approach:

  • Use reservoir sampling to handle random selection efficiently.

Code:

import random

class Solution:
    def __init__(self, nums):
        self.nums = nums

    def pick(self, target):
        count = 0
        result = -1
        for i, num in enumerate(self.nums):
            if num == target:
                count += 1
                if random.randint(1, count) == count:
                    result = i
        return result

# Example usage:
nums = [1, 2, 3, 3, 3]
target = 3
obj = Solution(nums)
print(obj.pick(target))  # Randomly returns one of the indices: 2, 3, or 4

Explanation:

  1. Reservoir Sampling: Ensure each index has an equal probability of being chosen.
  2. Count Occurrences: Traverse the array, counting occurrences of the target and selecting based on random chance.

670. Maximum Swap

Problem: Given a non-negative integer, you can swap two digits at most once to get the maximum valued number. Return the maximum valued number.

Approach:

  • Use a greedy algorithm to find the best swap.

Code:

def maximum_swap(num):
    digits = list(str(num))
    last = {int(x): i for i, x in enumerate(digits)}

    for i, x in enumerate(digits):
        for d in range(9, int(x), -1):
            if last.get(d, -1) > i:
                digits[i], digits[last[d]] = digits[last[d]], digits[i]
                return int(''.join(digits))

    return num

# Example usage:
num = 2736
print(maximum_swap(num))  # Output: 7236

Explanation:

  1. Greedy Approach: Find the highest digit that can be swapped to maximize the number.
  2. Track Last Occurrence: Use a dictionary to store the last occurrence of each digit for efficient swaps.

415. Add Strings

Problem: Given two non-negative integers represented as strings, return the sum of the two numbers as a string.

Approach:

  • Use digit-by-digit addition with carry handling.

Code:

def add_strings(num1, num2):
    i, j = len(num1) - 1, len(num2) - 1
    carry = 0
    result = []

    while i >= 0 or j >= 0 or carry:
        x = int(num1[i]) if i >= 0 else 0
        y = int(num2[j]) if j >= 0 else 0
        total = x + y + carry
        carry = total // 10
        result.append(total % 10)
        i -= 1
        j -= 1

    return ''.join(map(str, result[::-1]))

# Example usage:
num1 = "123"
num2 = "456"

Subset Techniques

Subset techniques involve generating and manipulating subsets of a given set. These techniques are widely used in combinatorial problems, where you need to explore all possible combinations of elements. Here, we will explore different methods to generate subsets and their applications.

Key Concepts

  1. Subset: A subset is any combination of elements from a set, including the empty set and the set itself.
  2. Power Set: The power set is the set of all possible subsets of a set, including the empty set and the set itself.

Methods to Generate Subsets

  1. Recursive Backtracking: A common method to generate all subsets by exploring all possibilities recursively.
  2. Iterative Approach: Using iterative techniques to build subsets, often leveraging bit manipulation.
  3. Library Functions: Using built-in functions or libraries in programming languages to generate subsets.

1. Recursive Backtracking

Recursive backtracking explores all possible subsets by including or excluding each element.

Code:

def subsets_backtracking(nums):
    def backtrack(start, path):
        result.append(path)
        for i in range(start, len(nums)):
            backtrack(i + 1, path + [nums[i]])

    result = []
    backtrack(0, [])
    return result

# Example usage:
nums = [1, 2, 3]
print(subsets_backtracking(nums))  # Output: [[], [1], [1, 2], [1, 2, 3], [1, 3], [2], [2, 3], [3]]

Explanation:

  1. Initialize: Start with an empty path and explore all possibilities.
  2. Include/Exclude: For each element, decide to include it in the current path or not.
  3. Recursive Call: Recursively call the function with the next starting index.
  4. Collect Results: Collect all paths (subsets) in the result list.

2. Iterative Approach

The iterative approach builds subsets by iterating over the existing subsets and adding the current element to each of them.

Code:

def subsets_iterative(nums):
    result = [[]]
    for num in nums:
        result += [curr + [num] for curr in result]
    return result

# Example usage:
nums = [1, 2, 3]
print(subsets_iterative(nums))  # Output: [[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3]]

Explanation:

  1. Initialize: Start with the empty subset.
  2. Iterate: For each element, add it to all existing subsets to form new subsets.
  3. Update Result: Append the new subsets to the result list.

3. Bit Manipulation

Using bit manipulation to generate subsets leverages the binary representation of numbers. Each bit can represent the inclusion or exclusion of an element.

Code:

def subsets_bit_manipulation(nums):
    n = len(nums)
    result = []
    for i in range(1 << n):
        subset = []
        for j in range(n):
            if i & (1 << j):
                subset.append(nums[j])
        result.append(subset)
    return result

# Example usage:
nums = [1, 2, 3]
print(subsets_bit_manipulation(nums))  # Output: [[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3]]

Explanation:

  1. Binary Representation: Iterate over the range (0) to (2^n - 1) (all possible binary numbers with (n) bits).
  2. Include/Exclude: Use each bit to decide whether to include the corresponding element.
  3. Form Subsets: Form subsets based on the binary representation and collect them in the result list.

Applications of Subset Techniques

  1. Combinatorial Problems: Problems that require exploring all possible combinations of elements, such as the knapsack problem, generating power sets, and finding all unique subsets.
  2. Optimization Problems: Problems that involve finding the best subset that meets certain criteria, such as maximizing profit or minimizing cost.
  3. String Manipulation: Problems involving substrings or subsequences where all possible combinations of characters need to be explored.
  4. Subset Sum Problem: Finding subsets that sum to a specific value, used in dynamic programming and algorithmic challenges.

Summary

  • Recursive Backtracking: Explores all subsets by including or excluding each element recursively. It is simple and easy to understand but can be less efficient for larger sets.
  • Iterative Approach: Builds subsets iteratively by adding each element to existing subsets. It is more efficient and avoids the overhead of recursion.
  • Bit Manipulation: Leverages binary representation to generate subsets. It is highly efficient and compact, suitable for problems with fixed-size sets.

Each method has its strengths and is suited to different types of problems. By understanding and applying these techniques, you can efficiently solve a wide range of combinatorial and optimization problems involving subsets.

Top K Elements Technique

The "Top K Elements" technique involves finding the largest (or smallest) ( k ) elements from a dataset. This is a common problem in computer science with applications in data analysis, search engines, recommendation systems, and more.

Key Concepts

  1. Priority Queue (Min-Heap or Max-Heap): A heap data structure that allows efficient extraction of the minimum or maximum element.
  2. Quickselect Algorithm: A selection algorithm to find the ( k )th largest (or smallest) element in an unordered list.
  3. Sorting: Sorting the entire dataset and then selecting the top ( k ) elements.
  4. Bucket Sort or Counting Sort: Special techniques for integer data with a limited range.

Methods to Find Top K Elements

  1. Heap (Priority Queue) Method: Efficient for dynamic data and large datasets.
  2. Quickselect Algorithm: Efficient for static data with good average-case performance.
  3. Sorting: Simple but less efficient for large datasets.
  4. Bucket Sort or Counting Sort: Efficient for specific types of data.

1. Heap (Priority Queue) Method

Using a min-heap (for the largest ( k ) elements) or max-heap (for the smallest ( k ) elements) to maintain the top ( k ) elements.

Code:

import heapq

def top_k_elements(nums, k):
    # Use a min-heap for the largest k elements
    if k == 0:
        return []

    min_heap = nums[:k]
    heapq.heapify(min_heap)

    for num in nums[k:]:
        if num > min_heap[0]:
            heapq.heappushpop(min_heap, num)

    return min_heap

# Example usage:
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(top_k_elements(nums, k))  # Output: [5, 6]

Explanation:

  1. Initialize: Create a min-heap with the first ( k ) elements.
  2. Process: For each remaining element, if it is larger than the smallest element in the heap, replace the smallest element.
  3. Result: The heap contains the top ( k ) largest elements.

2. Quickselect Algorithm

Quickselect is a selection algorithm to find the ( k )th largest (or smallest) element in an unordered list, which can then be used to find the top ( k ) elements.

Code:

import random

def quickselect(nums, k):
    def partition(left, right, pivot_index):
        pivot_value = nums[pivot_index]
        nums[pivot_index], nums[right] = nums[right], nums[pivot_index]
        store_index = left
        for i in range(left, right):
            if nums[i] < pivot_value:
                nums[store_index], nums[i] = nums[i], nums[store_index]
                store_index += 1
        nums[right], nums[store_index] = nums[store_index], nums[right]
        return store_index

    def select(left, right, k_smallest):
        if left == right:
            return nums[left]
        pivot_index = random.randint(left, right)
        pivot_index = partition(left, right, pivot_index)
        if k_smallest == pivot_index:
            return nums[k_smallest]
        elif k_smallest < pivot_index:
            return select(left, pivot_index - 1, k_smallest)
        else:
            return select(pivot_index + 1, right, k_smallest)

    n = len(nums)
    return select(0, n - 1, n - k)

# Example usage:
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(quickselect(nums, k))  # Output: 5

Explanation:

  1. Partition: Use the partitioning step from the QuickSort algorithm to position the pivot element correctly.
  2. Select: Recursively select the ( k )th smallest element, which corresponds to the ( (n - k) )th largest element.
  3. Result: The algorithm returns the ( k )th largest element, and the elements larger than it can be found in the list.

3. Sorting

Sorting the array and selecting the top ( k ) elements.

Code:

def top_k_elements_sort(nums, k):
    nums.sort(reverse=True)
    return nums[:k]

# Example usage:
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(top_k_elements_sort(nums, k))  # Output: [6, 5]

Explanation:

  1. Sort: Sort the array in descending order.
  2. Select: Take the first ( k ) elements from the sorted array.
  3. Result: These are the top ( k ) largest elements.

4. Bucket Sort or Counting Sort

For integer data with a limited range, bucket sort or counting sort can be efficient.

Code (Counting Sort for a limited range):

def top_k_elements_counting_sort(nums, k):
    max_val = max(nums)
    count = [0] * (max_val + 1)

    for num in nums:
        count[num] += 1

    result = []
    for num in range(max_val, -1, -1):
        while count[num] > 0 and len(result) < k:
            result.append(num)
            count[num] -= 1

    return result

# Example usage:
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(top_k_elements_counting_sort(nums, k))  # Output: [6, 5]

Explanation:

  1. Count Frequency: Count the frequency of each element.
  2. Select: Traverse the count array from the highest value, selecting elements until ( k ) elements are selected.
  3. Result: These are the top ( k ) largest elements.

If you want to find the largest K numbers and the smallest K numbers in an array simultaneously, you can use a combination of Min-Heap and Max-Heap. Here's a step-by-step approach in English:

Approach:

  1. Initialization:
    • Create a Min-Heap to keep track of the largest K elements.
    • Create a Max-Heap to keep track of the smallest K elements. (Since Python's heapq module only provides a Min-Heap, you can simulate a Max-Heap by pushing negative values.)
  2. Iterate through the array:
    • For each element in the array:
      • Add it to the Min-Heap if it's larger than the smallest element in the heap (the root). If the Min-Heap exceeds size K, remove the smallest element.
      • Add it to the Max-Heap (with a negative sign) if it's smaller than the largest element (the root). If the Max-Heap exceeds size K, remove the largest element.
  3. Results:
    • After processing all elements, the Min-Heap contains the largest K elements, and the Max-Heap (with the negative signs removed) contains the smallest K elements.

This method ensures that you maintain both the largest and smallest K elements as you process the array, and the time complexity remains efficient.

Code:

import heapq

def find_largest_and_lowest_k_numbers(nums, k):
    # Min-Heap for largest K numbers
    min_heap = nums[:k]
    heapq.heapify(min_heap)

    # Max-Heap for smallest K numbers (using negative values)
    max_heap = [-num for num in nums[:k]]
    heapq.heapify(max_heap)

    # Process the remaining elements in the array
    for num in nums[k:]:
        # Maintain the largest K numbers
        if num > min_heap[0]:
            heapq.heapreplace(min_heap, num)

        # Maintain the smallest K numbers
        if -num > max_heap[0]:
            heapq.heapreplace(max_heap, -num)

    # Convert the Max-Heap back to positive numbers
    lowest_k = [-x for x in max_heap]

    return min_heap, lowest_k

# Example usage
nums = [3, 2, 1, 5, 6, 4, 8, 7]
k = 3
largest_k, lowest_k = find_largest_and_lowest_k_numbers(nums, k)
print("Largest K:", largest_k)  # Output: [6, 7, 8]
print("Lowest K:", lowest_k)    # Output: [1, 2, 3]

215. Kth Largest Element in an Array

Problem: Given an integer array nums and an integer k, return the kth largest element in the array.

Approach:

  • Min-Heap: Use a min-heap to keep track of the largest k elements in the array. The top element of the heap will be the kth largest element.
  • Quickselect: Another approach is to use the Quickselect algorithm, which is based on the partition method used in QuickSort. This method is efficient with an average time complexity of (O(n)).

Min-Heap Code:

import heapq

def find_kth_largest(nums, k):
    heap = nums[:k]
    heapq.heapify(heap)
    for num in nums[k:]:
        if num > heap[0]:
            heapq.heappushpop(heap, num)
    return heap[0]

# Example usage:
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(find_kth_largest(nums, k))  # Output: 5

Quickselect Code:

def find_kth_largest(nums, k):
    k = len(nums) - k

    def quickselect(l, r):
        pivot, p = nums[r], l
        for i in range(l, r):
            if nums[i] <= pivot:
                nums[p], nums[i] = nums[i], nums[p]
                p += 1
        nums[p], nums[r] = nums[r], nums[p]
        if p > k:
            return quickselect(l, p - 1)
        elif p < k:
            return quickselect(p + 1, r)
        else:
            return nums[p]

    return quickselect(0, len(nums) - 1)

# Example usage:
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(find_kth_largest(nums, k))  # Output: 5


23. Merge k Sorted Lists

Problem: You are given an array of k linked-lists, each linked-list is sorted in ascending order. Merge all the linked-lists into one sorted linked-list and return it.

Approach:

  • Min-Heap: Use a min-heap to efficiently merge the k sorted linked lists. Each time, extract the smallest element from the heap and add it to the merged list.

Code:

import heapq

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

    def __lt__(self, other):
        return self.val < other.val

def merge_k_lists(lists):
    min_heap = []
    for root in lists:
        if root:
            heapq.heappush(min_heap, root)

    dummy = ListNode()
    curr = dummy

    while min_heap:
        smallest_node = heapq.heappop(min_heap)
        curr.next = smallest_node
        curr = curr.next
        if smallest_node.next:
            heapq.heappush(min_heap, smallest_node.next)

    return dummy.next

# Example usage:
# Assuming ListNode class is defined and linked lists are created.
lists = [list1, list2, list3]  # Example linked-lists
result = merge_k_lists(lists)

Explanation:

  1. Min-Heap: Each list's head is added to the heap. The heap helps efficiently find and add the smallest element across all lists to the final merged list.


703. Kth Largest Element in a Stream

Problem: Design a class to find the kth largest element in a stream. Implement the KthLargest class:

  • KthLargest(int k, int[] nums) Initializes the object with the integer k and the stream of integers nums.
  • int add(int val) Appends the integer val to the stream and returns the element representing the kth largest element in the stream.

Approach:

  • Min-Heap: Maintain a min-heap of size k. The smallest element in the heap is the kth largest element in the stream.

Code:

import heapq

class KthLargest:
    def __init__(self, k, nums):
        self.k = k
        self.min_heap = nums
        heapq.heapify(self.min_heap)
        while len(self.min_heap) > k:
            heapq.heappop(self.min_heap)

    def add(self, val):
        heapq.heappush(self.min_heap, val)
        if len(self.min_heap) > self.k:
            heapq.heappop(self.min_heap)
        return self.min_heap[0]

# Example usage:
k = 3
nums = [4, 5, 8, 2]
kthLargest = KthLargest(k, nums)
print(kthLargest.add(3))  # Output: 4
print(kthLargest.add(5))  # Output: 5
print(kthLargest.add(10))  # Output: 5
print(kthLargest.add(9))  # Output: 8
print(kthLargest.add(4))  # Output: 8

Summary

  • Heap (Priority Queue) Method:
    • Pros: Efficient for dynamic data and large datasets.
    • Cons: Slightly complex to implement.
  • Quickselect Algorithm:
    • Pros: Good average-case performance, efficient for static data.
    • Cons: Worst-case performance can be poor.
  • Sorting:
    • Pros: Simple to implement.
    • Cons: Less efficient for large datasets ((O(n \log n))).
  • Bucket Sort or Counting Sort:
    • Pros: Very efficient for integer data with a limited range.
    • Cons: Not general-purpose, requires specific data characteristics.

Choosing the right method depends on the characteristics of the dataset and the specific requirements of the problem at hand.

'ML Engineering > python' 카테고리의 다른 글

08. Math  (0) 2024.08.07
07. Subset Techniques  (0) 2024.08.06
05. Modified Binary Search Technique  (0) 2024.08.06
04. Binary Tree BFS Techniques  (0) 2024.08.06
03. Binary Tree DFS Techniques  (0) 2024.08.06

Modified Binary Search Technique

Modified Binary Search techniques are variations of the traditional binary search algorithm that are adapted to solve specific problems or work on specialized data structures. The traditional binary search algorithm is used to find the position of a target value within a sorted array in (O(\log n)) time. Modified versions extend this principle to handle more complex scenarios.

Key Concepts

  1. Binary Search: A divide-and-conquer algorithm that repeatedly divides the search interval in half.
  2. Modification: Adjusting the standard binary search to handle variations in data or specific problem requirements.

Applications

  1. Finding the First or Last Occurrence of an Element.
  2. Searching in a Rotated Sorted Array.
  3. Finding the Peak Element in an Array.
  4. Finding the Square Root of a Number.
  5. Searching in a Nearly Sorted Array.

1. Finding the First or Last Occurrence of an Element

Problem: Given a sorted array with duplicate elements, find the first or last occurrence of a target value.

Code:

def find_first_occurrence(arr, target):
    left, right = 0, len(arr) - 1
    result = -1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            result = mid
            right = mid - 1  # continue to search on the left side
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return result

def find_last_occurrence(arr, target):
    left, right = 0, len(arr) - 1
    result = -1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            result = mid
            left = mid + 1  # continue to search on the right side
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return result

# Example usage:
arr = [1, 2, 2, 2, 3, 4, 5]
target = 2
print(find_first_occurrence(arr, target))  # Output: 1
print(find_last_occurrence(arr, target))   # Output: 3

Explanation:

  1. Initialization: Start with the left and right pointers at the ends of the array.
  2. Midpoint Calculation: Calculate the midpoint of the current search interval.
  3. Adjust Search Range: If the target is found, update the result and adjust the search range to continue looking for the first or last occurrence.
  4. Result: Return the index of the first or last occurrence.

2. Searching in a Rotated Sorted Array

Problem: Given a rotated sorted array, find the index of a target value.

Code:

def search_rotated_sorted_array(nums, target):
    left, right = 0, len(nums) - 1
    while left <= right:
        mid = (left + right) // 2
        if nums[mid] == target:
            return mid
        if nums[left] <= nums[mid]:  # left half is sorted
            if nums[left] <= target < nums[mid]:
                right = mid - 1
            else:
                left = mid + 1
        else:  # right half is sorted
            if nums[mid] < target <= nums[right]:
                left = mid + 1
            else:
                right = mid - 1
    return -1

# Example usage:
nums = [4, 5, 6, 7, 0, 1, 2]
target = 0
print(search_rotated_sorted_array(nums, target))  # Output: 4

Explanation:

  1. Initialization: Start with the left and right pointers at the ends of the array.
  2. Midpoint Calculation: Calculate the midpoint of the current search interval.
  3. Determine Sorted Half: Check which half of the array is sorted.
  4. Adjust Search Range: Narrow down the search range to the half that might contain the target.
  5. Result: Return the index of the target if found, otherwise return -1.

3. Finding the Peak Element in an Array

Problem: Given an array, find the index of a peak element. A peak element is greater than its neighbors.

Code:

def find_peak_element(nums):
    left, right = 0, len(nums) - 1
    while left < right:
        mid = (left + right) // 2
        if nums[mid] > nums[mid + 1]:
            right = mid
        else:
            left = mid + 1
    return left

# Example usage:
nums = [1, 2, 3, 1]
print(find_peak_element(nums))  # Output: 2 (index of peak element 3)

Explanation:

  1. Initialization: Start with the left and right pointers at the ends of the array.
  2. Midpoint Calculation: Calculate the midpoint of the current search interval.
  3. Compare Midpoint with Neighbor: Compare the midpoint with its right neighbor to determine the direction of the peak.
  4. Adjust Search Range: Move the search range towards the peak.
  5. Result: Return the index of the peak element.

4. Finding the Square Root of a Number

Problem: Find the integer square root of a non-negative integer ( x ).

Code:

def my_sqrt(x):
    if x < 2:
        return x
    left, right = 2, x // 2
    while left <= right:
        mid = (left + right) // 2
        num = mid * mid
        if num == x:
            return mid
        elif num < x:
            left = mid + 1
        else:
            right = mid - 1
    return right

# Example usage:
x = 8
print(my_sqrt(x))  # Output: 2

Explanation:

  1. Initialization: Start with the left pointer at 2 and the right pointer at ( x // 2 ).
  2. Midpoint Calculation: Calculate the midpoint of the current search interval.
  3. Square Comparison: Compare the square of the midpoint with ( x ).
  4. Adjust Search Range: Narrow down the search range based on the comparison.
  5. Result: Return the integer part of the square root.

5. Searching in a Nearly Sorted Array

Problem: Given a nearly sorted array (where each element may be misplaced by at most one position), find the index of a target value.

Code:

def search_nearly_sorted_array(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        if mid - 1 >= left and arr[mid - 1] == target:
            return mid - 1
        if mid + 1 <= right and arr[mid + 1] == target:
            return mid + 1
        if arr[mid] < target:
            left = mid + 2
        else:
            right = mid - 2
    return -1

# Example usage:
arr = [10, 3, 40, 20, 50, 80, 70]
target = 40
print(search_nearly_sorted_array(arr, target))  # Output: 2

Explanation:

  1. Initialization: Start with the left and right pointers at the ends of the array.
  2. Midpoint Calculation: Calculate the midpoint of the current search interval.
  3. Check Neighboring Elements: Check the midpoint and its immediate neighbors.
  4. Adjust Search Range: Narrow down the search range based on the comparison.
  5. Result: Return the index of the target if found, otherwise return -1.

1. 1004. Max Consecutive Ones III

This problem is typically solved using the sliding window technique, not binary search. However, if you wanted to use binary search to determine the maximum window size, you could explore different window sizes and use a helper function to check the validity, but this would be less efficient than the sliding window method.

Problem:

Given a binary array nums and an integer k, find the maximum number of consecutive 1s in the array if you can flip at most k 0s.

Code:
This problem doesn't naturally lend itself to a binary search solution in the traditional sense. Instead, the optimal approach is the sliding window method:

def longestOnes(nums: List[int], k: int) -> int:
    left = 0
    max_length = 0
    zeros_count = 0

    for right in range(len(nums)):
        if nums[right] == 0:
            zeros_count += 1

        while zeros_count > k:
            if nums[left] == 0:
                zeros_count -= 1
            left += 1

        max_length = max(max_length, right - left + 1)

    return max_length

Explanation:

  • We expand the window size until we exceed the number of 0s that can be flipped, and then we start shrinking the window from the left side.
  • This problem is best solved using the sliding window technique, and binary search is not a natural fit here.

33. Search in Rotated Sorted Array

Problem:
Given a rotated sorted array and a target value, find the index of the target. If not found, return -1.

Code:

def search(nums: List[int], target: int) -> int:
    left, right = 0, len(nums) - 1

    while left <= right:
        mid = (left + right) // 2

        if nums[mid] == target:
            return mid

        if nums[left] <= nums[mid]:  # Left half is sorted
            if nums[left] <= target < nums[mid]:
                right = mid - 1
            else:
                left = mid + 1
        else:  # Right half is sorted
            if nums[mid] < target <= nums[right]:
                left = mid + 1
            else:
                right = mid - 1

    return -1

Explanation:

  • We use binary search to identify which half of the array is sorted.
  • Depending on whether the target lies within the sorted half, we adjust our search range.

34. Find First and Last Position of Element in Sorted Array

Problem:
Given a sorted array of integers nums and a target value, find the starting and ending position of the target.

Code:

def searchRange(nums: List[int], target: int) -> List[int]:
    def findLeft():
        left, right = 0, len(nums) - 1
        while left <= right:
            mid = (left + right) // 2
            if nums[mid] < target:
                left = mid + 1
            else:
                right = mid - 1
        return left

    def findRight():
        left, right = 0, len(nums) - 1
        while left <= right:
            mid = (left + right) // 2
            if nums[mid] <= target:
                left = mid + 1
            else:
                right = mid - 1
        return right

    left_index = findLeft()
    right_index = findRight()

    if left_index <= right_index:
        return [left_index, right_index]
    return [-1, -1]

Explanation:

  • Two binary searches are conducted: one for the first occurrence of the target (findLeft), and one for the last occurrence (findRight).
  • This ensures both the start and end positions of the target are found efficiently.

69. Sqrt(x)

Problem:
Given a non-negative integer x, compute and return the square root of x. Since the return type is an integer, only the integer part of the result is returned.

Code:

def mySqrt(x: int) -> int:
    left, right = 0, x

    while left <= right:
        mid = (left + right) // 2
        if mid * mid == x:
            return mid
        elif mid * mid < x:
            left = mid + 1
        else:
            right = mid - 1

    return right

Explanation:

  • The binary search finds the integer square root by narrowing down the possible values.
  • If mid * mid is too small, the left boundary is adjusted; if too large, the right boundary is adjusted.

162. Find Peak Element

Problem:
A peak element is an element that is greater than its neighbors. Given an array of integers, find a peak element and return its index.

Code:

def findPeakElement(nums: List[int]) -> int:
    left, right = 0, len(nums) - 1

    while left < right:
        mid = (left + right) // 2

        if nums[mid] > nums[mid + 1]:
            right = mid
        else:
            left = mid + 1

    return left

Explanation:

  • Binary search is used to find the peak by comparing the middle element with its neighbors.
  • If the middle element is larger than the next, the peak is to the left; otherwise, it's to the right.

825. Friends Of Appropriate Ages

Problem:
Given an array representing the ages of a group of people, determine the number of friend requests made according to specific rules.

Code:
This problem does not naturally lend itself to binary search as the primary technique. Instead, a combination of sorting and two-pointer technique (which is sometimes referred to as a variation of binary search) is used to solve it efficiently.

def numFriendRequests(ages: List[int]) -> int:
    ages.sort()
    requests = 0

    for i in range(len(ages)):
        if ages[i] <= 14:
            continue

        left = bisect.bisect_left(ages, 0.5 * ages[i] + 7 + 1)
        right = bisect.bisect_right(ages, ages[i])

        requests += right - left - 1

    return requests

Explanation:

  • The ages are sorted, and for each person, valid friend requests are counted by using binary search (bisect_left and bisect_right) to find the valid age range.
  • This ensures that the number of friend requests is calculated efficiently.

Binary search is an essential technique for several of these problems, particularly when dealing with sorted arrays or when trying to minimize/maximize a certain condition within a range of values.

Summary

Modified binary search techniques adapt the basic binary search algorithm to solve a variety of problems more efficiently. By leveraging the divide-and-conquer strategy, these techniques can handle different data structures and problem constraints. Understanding and implementing these variations can significantly improve the performance of your algorithms for specific use cases.

'ML Engineering > python' 카테고리의 다른 글

07. Subset Techniques  (0) 2024.08.06
06. Top K Elements Technique  (0) 2024.08.06
04. Binary Tree BFS Techniques  (0) 2024.08.06
03. Binary Tree DFS Techniques  (0) 2024.08.06
02. Sliding Window Technique  (0) 2024.08.06

Binary Tree BFS Techniques

Breadth-First Search (BFS) is a fundamental algorithm used to traverse or search tree or graph data structures. In the context of binary trees, BFS is often referred to as level-order traversal because it visits all nodes at each level of the tree before moving to the next level.

Key Concepts

  1. Queue: BFS uses a queue to keep track of nodes at the current level before moving to the next level.
  2. Level by Level Traversal: Nodes are processed level by level, starting from the root.
  3. First In, First Out (FIFO): Nodes are added to the queue in the order they are encountered and processed in the same order.

Steps

  1. Initialize: Start with a queue containing the root node.
  2. Process: Dequeue a node, process it, and enqueue its children.
  3. Repeat: Continue until the queue is empty, indicating that all nodes have been processed.

Applications

  1. Level Order Traversal: Printing or collecting nodes level by level.
  2. Finding the Depth/Height: Calculating the maximum depth or height of the tree.
  3. Shortest Path in Unweighted Trees: Finding the shortest path in terms of number of edges from the root to any node.
  4. Checking Completeness: Determining if a binary tree is complete.

Example: Level Order Traversal

Problem: Traverse the binary tree level by level and print each level.

Code:

from collections import deque

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def level_order_traversal(root):
    if not root:
        return []

    result = []
    queue = deque([root])

    while queue:
        level_size = len(queue)
        level = []

        for _ in range(level_size):
            node = queue.popleft()
            level.append(node.val)

            if node.left:
                queue.append(node.left)
            if node.right:
                queue.append(node.right)

        result.append(level)

    return result

# Example usage:
# Constructing the binary tree
#       3
#      / \
#     9   20
#        /  \
#       15   7
tree = TreeNode(3)
tree.left = TreeNode(9)
tree.right = TreeNode(20, TreeNode(15), TreeNode(7))

print(level_order_traversal(tree))  # Output: [[3], [9, 20], [15, 7]]

Explanation:

  1. Initialize: A queue is initialized with the root node.
  2. Process: Nodes are dequeued one by one. Their values are collected, and their children are enqueued.
  3. Level by Level: The process repeats until the queue is empty, resulting in a list of lists where each sublist contains nodes at the same level.

Example: Finding Maximum Depth

Problem: Calculate the maximum depth (height) of a binary tree.

Code:

from collections import deque

def max_depth_bfs(root):
    if not root:
        return 0

    queue = deque([root])
    depth = 0

    while queue:
        depth += 1
        level_size = len(queue)

        for _ in range(level_size):
            node = queue.popleft()

            if node.left:
                queue.append(node.left)
            if node.right:
                queue.append(node.right)

    return depth

# Example usage:
print(max_depth_bfs(tree))  # Output: 3

Explanation:

  1. Initialize: A queue is initialized with the root node and depth is set to 0.
  2. Process: Nodes are dequeued level by level, and the depth counter is incremented after processing each level.
  3. Depth Calculation: The process repeats until the queue is empty, resulting in the maximum depth of the tree.

Summary of BFS Techniques in Binary Trees

  • Level Order Traversal: Collecting or printing nodes level by level using a queue.
  • Finding Maximum Depth: Using BFS to determine the maximum depth by processing nodes level by level.
  • Shortest Path in Unweighted Trees: BFS naturally finds the shortest path in terms of edges from the root to any node.
  • Checking Completeness: Ensuring that all levels of the tree are fully filled except possibly the last level, which should be filled from left to right.

Advantages of BFS

  • Simple and Intuitive: Easy to understand and implement using a queue.
  • Level by Level Processing: Ideal for problems that require processing nodes level by level.
  • Shortest Path: Naturally finds the shortest path in unweighted trees or graphs.

When to Use BFS

  • When you need to process nodes level by level.
  • When finding the shortest path in an unweighted graph or tree.
  • When calculating metrics that depend on levels, such as the maximum depth or checking the completeness of a tree.

1. 200. Number of Islands

Problem:

Given a 2D grid of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically.

Approach (BFS):

  • Use Breadth-First Search (BFS) to explore each island. Start BFS whenever you encounter '1', mark all connected '1's as visited (by changing them to '0'), and increment the island count.

Python Code:

Certainly! Here is the code with comments explaining each step.

1. 200. Number of Islands

from collections import deque

def numIslands(grid: List[List[str]]) -> int:
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    island_count = 0

    # BFS function to traverse the island
    def bfs(r, c):
        queue = deque([(r, c)])
        while queue:
            row, col = queue.popleft()
            # Check boundaries and if the current cell is part of an island
            if 0 <= row < rows and 0 <= col < cols and grid[row][col] == '1':
                grid[row][col] = '0'  # Mark as visited by setting to '0'
                # Add all neighboring cells (up, down, left, right) to the queue
                queue.extend([(row-1, col), (row+1, col), (row, col-1), (row, col+1)])

    # Iterate over each cell in the grid
    for r in range(rows):
        for c in range(cols):
            # If the cell is a '1', start BFS to mark the entire island
            if grid[r][c] == '1':
                bfs(r, c)
                island_count += 1  # Increase island count for each BFS

    return island_count  # Return the total number of islands found

Example:

grid = [
  ["1","1","1","1","0"],
  ["1","1","0","1","0"],
  ["1","1","0","0","0"],
  ["0","0","0","0","0"]
]
print(numIslands(grid))  # Output: 1

Explanation:

  • We start BFS whenever we find an unvisited '1' and mark all connected lands as visited by changing them to '0'.
  • The BFS continues until all parts of the island are explored.
  • This continues for each island found, resulting in the total count.

2. 339. Nested List Weight Sum

Problem:

Given a nested list of integers, return the sum of all integers in the list weighted by their depth. For example, [1,[4,[6]]] would return 1*1 + 4*2 + 6*3 = 27.

Approach (BFS):

  • Use a Breadth-First Search (BFS) approach to traverse the nested list and calculate the weighted sum by tracking the depth level of each element.

Python Code:

from collections import deque

def depthSum(nestedList: List[NestedInteger]) -> int:
    queue = deque([(nestedList, 1)])  # Queue stores (list, depth) pairs
    total_sum = 0

    # BFS loop to process each element in the queue
    while queue:
        current_list, depth = queue.popleft()
        for element in current_list:
            if element.isInteger():
                # If it's an integer, add to the total sum weighted by depth
                total_sum += element.getInteger() * depth
            else:
                # If it's a list, add it to the queue with increased depth
                queue.append((element.getList(), depth + 1))

    return total_sum  # Return the weighted sum of all integers

Example:

nestedList = [NestedInteger(1), NestedInteger([NestedInteger(4), NestedInteger([NestedInteger(6)])])]
print(depthSum(nestedList))  # Output: 27

Explanation:

  • The BFS traverses the nested list level by level.
  • Each integer’s contribution to the sum is calculated based on its depth, which is tracked as we traverse each level.

3. 827. Making A Large Island

Problem:

You are given an n x n binary grid. You can change exactly one 0 to 1. Find the largest island size you can create by making this change.

Approach (BFS):

  • First, identify all islands using BFS, and assign unique island identifiers.
  • Then, for each 0 in the grid, check its neighboring islands and calculate the possible new island size by merging these islands.
  • Track the maximum possible island size.

Python Code:

from collections import deque, defaultdict

def largestIsland(grid: List[List[int]]) -> int:
    n = len(grid)
    island_size = defaultdict(int)  # Dictionary to store island sizes
    island_id = 2  # Start with an island id of 2 to distinguish from 1 and 0

    # BFS function to determine the size of each island
    def bfs(r, c, island_id):
        queue = deque([(r, c)])
        size = 0
        while queue:
            row, col = queue.popleft()
            if 0 <= row < n and 0 <= col < n and grid[row][col] == 1:
                grid[row][col] = island_id  # Mark as part of the current island
                size += 1  # Increment the size of the island
                # Add neighboring cells to the queue
                queue.extend([(row-1, col), (row+1, col), (row, col-1), (row, col+1)])
        return size

    # First pass: assign ids to islands and record sizes
    for r in range(n):
        for c in range(n):
            if grid[r][c] == 1:  # If it's part of an island
                island_size[island_id] = bfs(r, c, island_id)
                island_id += 1

    # Second pass: check every 0 cell and calculate potential island size
    max_island = max(island_size.values(), default=0)  # Start with the largest existing island
    for r in range(n):
        for c in range(n):
            if grid[r][c] == 0:  # Consider flipping this 0 to a 1
                seen_islands = set()  # Track unique neighboring islands
                new_size = 1  # Start with size 1 for the flipped cell
                for nr, nc in [(r-1, c), (r+1, c), (r, c-1), (r, c+1)]:
                    if 0 <= nr < n and 0 <= nc < n and grid[nr][nc] > 1:
                        seen_islands.add(grid[nr][nc])  # Add neighboring island ids
                new_size += sum(island_size[i] for i in seen_islands)  # Add sizes of all unique neighboring islands
                max_island = max(max_island, new_size)  # Update max island size if necessary

    return max_island  # Return the size of the largest possible island

Example:

grid = [
  [1, 0],
  [0, 1]
]
print(largestIsland(grid))  # Output: 3

Explanation:

  • The BFS first identifies all islands and calculates their sizes.
  • We then evaluate the effect of converting each 0 to 1, merging adjacent islands, and keeping track of the maximum island size.

4. 1091. Shortest Path in Binary Matrix

Problem:

Given a binary matrix, return the length of the shortest clear path from the top-left corner to the bottom-right corner. If such a path does not exist, return -1. The path can only move in 8 possible directions.

Approach (BFS):

  • Use BFS to explore all possible paths starting from the top-left corner. The first time you reach the bottom-right corner, the length of the path is the answer.

Python Code:

from collections import deque

def rightSideView(root: TreeNode) -> List[int]:
    if not root:
        return []

    view = []  # List to store the rightmost nodes
    queue = deque([root])  # Start BFS with the root

    while queue:
        level_length = len(queue)  # Number of nodes at the current level
        for i in range(level_length):
            node = queue.popleft()
            if i == level_length - 1:  # If it's the last node in the current level
                view.append(node.val)  # Add it to the view

            if node.left:
                queue.append(node.left)  # Add left child to the queue
            if node.right:
                queue.append(node.right)  # Add right child to the queue

    return view  # Return the list of rightmost nodes

Example:

grid = [
  [0, 1],
  [1, 0]
]
print(shortestPathBinaryMatrix(grid))  # Output: 2

Explanation:

  • BFS explores all paths level by level. The first time the BFS reaches the bottom-right corner, it returns the path length as the answer.
  • This guarantees the shortest path is found.

5. 199. Binary Tree Right Side View

Problem:

Given a binary tree, return the values of the nodes you can see when looking at the tree from the right side.

Approach (BFS):

  • Use BFS to traverse the tree level by level, and take the last node of each level, as it represents the rightmost node visible from the right side.

Python Code:

6. 133. Clone Graph

from collections import deque

def cloneGraph(node: 'Node') -> 'Node':
    if not node:
        return None

    clones = {node: Node(node.val)}  # Dictionary to keep track of cloned nodes
    queue = deque([node])  # BFS queue starting with the original node

    # BFS loop to clone the graph
    while queue:
        current = queue.popleft()
        for neighbor in current.neighbors:
            if neighbor not in clones:  # If the neighbor hasn't been cloned yet
                clones[neighbor] = Node(neighbor.val)  # Clone the neighbor
                queue.append(neighbor)  # Add the original neighbor to the queue
            clones[current].neighbors.append(clones[neighbor])  # Link the clone of the current node to the clone of the neighbor

    return clones[node]  # Return the clone of the original node

7. 102. Binary Tree Level Order Traversal

from collections import deque

def levelOrder(root: TreeNode) -> List[List[int]]:
    if not root:
        return []

    result = []  # List to store the level order traversal
    queue = deque([root])  # Start BFS with the root

    # BFS loop to traverse the tree level by level
    while queue:
        level = []
        level_length = len(queue)  # Number of nodes at the current level
        for _ in range(level_length):
            node = queue.popleft()
            level.append(node.val)  # Add the node's value to the current level

'ML Engineering > python' 카테고리의 다른 글

06. Top K Elements Technique  (0) 2024.08.06
05. Modified Binary Search Technique  (0) 2024.08.06
03. Binary Tree DFS Techniques  (0) 2024.08.06
02. Sliding Window Technique  (0) 2024.08.06
01. Two Pointers Technique  (0) 2024.08.06

+ Recent posts