Two Pointers Technique

The two pointers technique is a common and efficient method used to solve various problems involving arrays or linked lists. It involves using two pointers that traverse the data structure in a specific way to achieve the desired result.

Key Concepts

  1. Initialization: Typically, the two pointers are initialized at different positions, such as the start and end of the array, or both at the start but with different purposes.
  2. Movement: The pointers move towards each other or in a specific pattern based on the problem's conditions.
  3. Termination: The process continues until the pointers meet or cross each other, or until a specific condition is met.

Explanation:

  1. Initialize pointer left at the beginning of the array.
  2. Traverse the array with pointer right starting from the second element.
  3. If the element at right is different from the element at left, increment left and copy the element at right to left.
  4. Continue until right traverses the entire array.
  5. The array up to index left will contain unique elements.

Advantages of Two Pointers Technique

  • Efficiency: Often reduces the time complexity of problems from (O(n^2)) to (O(n)) by avoiding nested loops.
  • Simplicity: Provides a straightforward approach to solving problems involving arrays or lists.

Applications

  1. Finding Pair with Given Sum:
    • In a sorted array, use two pointers starting at the beginning and end. Adjust pointers based on the sum of elements they point to.
  2. Removing Duplicates from Sorted Array:
    • Use two pointers to track the position of unique elements and overwrite duplicates.
  3. Reversing a String or Array:
    • Use two pointers at the beginning and end, swapping elements and moving pointers towards each other.
  4. Container with Most Water:
    • Use two pointers to find the maximum area by moving towards each other and updating the maximum area found.
  5. Merging Two Sorted Arrays:
    • Use two pointers to traverse each array and merge them into a single sorted array.
  6. Valid Palindrome II
  7. Valid Word Abbreviation
  8. Merge Sorted Array
  9. Lowest Common Ancestor of a Binary Tree III
  10. Valid Palindrome
  11. 3Sum
  12. Next Permutation
  13. Move Zeroes
  14. Dot Product of Two Sparse Vectors

Examples

Example 1: Finding Pair with Given Sum in a Sorted Array

Problem: Given a sorted array and a target sum, find if there exists a pair of elements that add up to the target sum.

Code:

def find_pair_with_sum(arr, target):
    left, right = 0, len(arr) - 1
    while left < right:
        current_sum = arr[left] + arr[right]
        if current_sum == target:
            return (arr[left], arr[right])
        elif current_sum < target:
            left += 1
        else:
            right -= 1
    return None

# Example usage:
arr = [1, 2, 3, 4, 6]
target = 6
print(find_pair_with_sum(arr, target))  # Output: (2, 4)

Explanation:

  1. Initialize pointers left at the beginning and right at the end of the array.
  2. Calculate the sum of elements at these pointers.
  3. If the sum matches the target, return the pair.
  4. If the sum is less than the target, move the left pointer to the right to increase the sum.
  5. If the sum is greater than the target, move the right pointer to the left to decrease the sum.
  6. Continue until the pointers meet or cross each other.

Example 2: Removing Duplicates from Sorted Array

Problem: Given a sorted array, remove the duplicates in-place such that each element appears only once and return the new length.

Code:

def remove_duplicates(arr):
    if not arr:
        return 0
    left = 0
    for right in range(1, len(arr)):
        if arr[right] != arr[left]:
            left += 1
            arr[left] = arr[right]
    return left + 1

# Example usage:
arr = [1, 1, 2, 2, 3, 4, 4]
new_length = remove_duplicates(arr)
print(arr[:new_length])  # Output: [1, 2, 3, 4]


680. Valid Palindrome II

Problem: Given a non-empty string s, you may delete at most one character. Judge whether you can make it a palindrome.

Approach:

  • Use two pointers, left starting from the beginning and right from the end.
  • Compare characters at left and right.
  • If they don't match, skip one character and check if the remaining substring is a palindrome.

Code:

Code:

def valid_palindrome(s):
    def is_palindrome_range(i, j):
        while i < j:
            if s[i] != s[j]:
                return False
            i += 1
            j -= 1
        return True

    left, right = 0, len(s) - 1
    while left < right:
        if s[left] != s[right]:
            # If there's a mismatch, check by skipping one character
            return is_palindrome_range(left + 1, right) or is_palindrome_range(left, right - 1)
        left, right = left + 1, right - 1
    return True

Explanation:

  1. is_palindrome_range Function:
    • Instead of using all, this function now uses a while loop to check if the substring from i to j is a palindrome.
    • The loop compares characters at both ends (s[i] and s[j]), incrementing i and decrementing j to move towards the center of the substring.
    • If any characters don't match, it returns False.
    • If all characters match, it returns True.
  2. Main Function:
    • The rest of the logic remains the same. If a mismatch is found between s[left] and s[right], the function checks both options: skipping left or skipping right.

Example Usage:

# Example 1:
s1 = "abca"
print(valid_palindrome(s1))  # Output: True

# Example 2:
s2 = "racecar"
print(valid_palindrome(s2))  # Output: True

# Example 3:
s3 = "abc"
print(valid_palindrome(s3))  # Output: False

Time Complexity:

  • O(n) where n is the length of the string s. The while loop in is_palindrome_range checks each character at most once.

Space Complexity:

  • O(1) since only a few variables are used. The algorithm operates in constant space.


408. Valid Word Abbreviation

Problem: Given a non-empty string word and an abbreviation abbr, check if it is a valid abbreviation of the word.

Approach:

  • Use two pointers, one for word and one for abbr.
  • Traverse both strings, expanding numbers in abbr to check against word.

Python Code:

def valid_word_abbreviation(word, abbr):
    i, j = 0, 0
    while i < len(word) and j < len(abbr):
        # If abbr contains a digit
        if abbr[j].isdigit():
            if abbr[j] == '0':  # No leading zeros allowed
                return False
            num = 0
            # Convert the series of digits into a number
            while j < len(abbr) and abbr[j].isdigit():
                num = num * 10 + int(abbr[j])
                j += 1
            i += num  # Move i forward by the number
        else:
            # Characters must match
            if word[i] != abbr[j]:
                return False
            i += 1
            j += 1

    # Both i and j should reach the end of their respective strings
    return i == len(word) and j == len(abbr)

Explanation of the Approach:

  1. Two Pointers:
    • Use two pointers i and j, where i tracks the position in word and j tracks the position in abbr.
    • The goal is to traverse both word and abbr, expanding numbers in abbr to check if the corresponding characters match in word.
  2. Handling Digits:
    • If the current character in abbr[j] is a digit, we need to expand it to a number representing the number of characters to skip in word.
    • If a digit starts with '0', it's invalid since abbreviations cannot have leading zeros.
  3. Handling Characters:
    • If abbr[j] is a character (not a digit), it must exactly match word[i].
    • If they don't match, return False.
  4. End Condition:
    • After the loop, both i and j must reach the ends of their respective strings for the abbreviation to be valid.

Example Usage:

# Example 1:
word1 = "internationalization"
abbr1 = "i12iz4n"
print(valid_word_abbreviation(word1, abbr1))  # Output: True

# Example 2:
word2 = "apple"
abbr2 = "a2e"
print(valid_word_abbreviation(word2, abbr2))  # Output: False

# Example 3:
word3 = "substitution"
abbr3 = "s10n"
print(valid_word_abbreviation(word3, abbr3))  # Output: True

Explanation of Test Cases:

  1. Test Case 1:
    • Input: word = "internationalization", abbr = "i12iz4n"
    • Output: True
    • Explanation: The abbreviation expands to "i12iz4n" → "internationalization".
  2. Test Case 2:
    • Input: word = "apple", abbr = "a2e"
    • Output: False
    • Explanation: "a2e" would expand to "aple", but this does not match "apple".
  3. Test Case 3:
    • Input: word = "substitution", abbr = "s10n"
    • Output: True
    • Explanation: "s10n" expands to "substitution", which matches.

Time Complexity:

  • O(n + m), where n is the length of word and m is the length of abbr. The algorithm processes each character from both strings exactly once.

Space Complexity:

  • O(1), since we are using only a few variables to track indices and counts, without using additional space proportional to the input size.


88. Merge Sorted Array

Problem: Merge two sorted arrays nums1 and nums2 into nums1 as one sorted array.

Explanation of the Approach:

The problem requires merging two sorted arrays nums1 and nums2 in-place into nums1, where nums1 has enough space to hold both arrays. The goal is to merge them in sorted order.

Key Observations:

  • We are given m as the number of valid elements in nums1 and n as the number of elements in nums2.
  • nums1 has additional space at the end, which can accommodate the elements from nums2.

Two-pointer Approach (starting from the end):

  • Instead of merging the arrays from the front, we can merge them starting from the end. This way, we avoid overwriting the elements in nums1 that are yet to be compared.
  • We use two pointers: one for nums1 (starting at m-1) and one for nums2 (starting at n-1), and a third pointer for the end of nums1 (at m + n - 1).
  • At each step, we compare the elements at the two pointers and place the larger element at the current position in nums1.

Code:

def merge(nums1, m, nums2, n):
    # Start merging from the end
    while m > 0 and n > 0:
        if nums1[m - 1] > nums2[n - 1]:
            nums1[m + n - 1] = nums1[m - 1]
            m -= 1
        else:
            nums1[m + n - 1] = nums2[n - 1]
            n -= 1

    # If there are any remaining elements in nums2, place them in nums1
    nums1[:n] = nums2[:n]

Explanation:

  1. Merging from the End:
    • We start from the last elements of nums1 and nums2.
    • If nums1[m - 1] is larger than nums2[n - 1], we place nums1[m - 1] in the current position (m + n - 1), and move the pointer m to the left.
    • Otherwise, we place nums2[n - 1] in the current position, and move the pointer n to the left.
  2. Handling Remaining Elements in nums2:
    • If nums2 has remaining elements after all comparisons, copy them into the front of nums1. This is needed because if all elements in nums1 are larger, some elements in nums2 will remain unplaced.
  3. No Need to Copy nums1:
    • There's no need to handle any remaining elements from nums1 explicitly, as they are already in their correct positions if not overwritten.

Example Usage:

# Example 1:
nums1 = [1, 2, 3, 0, 0, 0]
m = 3
nums2 = [2, 5, 6]
n = 3
merge(nums1, m, nums2, n)
print(nums1)  # Output: [1, 2, 2, 3, 5, 6]

# Example 2:
nums1 = [4, 5, 6, 0, 0, 0]
m = 3
nums2 = [1, 2, 3]
n = 3
merge(nums1, m, nums2, n)
print(nums1)  # Output: [1, 2, 3, 4, 5, 6]

Time Complexity:

  • O(m + n): We traverse both nums1 and nums2 once, making a total of m + n comparisons and insertions.

Space Complexity:

  • O(1): The algorithm is performed in-place, so no additional space is required besides a few variables for tracking indices.


1650. Lowest Common Ancestor of a Binary Tree III

Problem: Find the lowest common ancestor of two nodes in a binary tree, where each node has a pointer to its parent.

Approach:

  • Use two pointers to traverse upwards from each node until they meet.

Explanation of the Approach:

The problem asks to find the Lowest Common Ancestor (LCA) of two nodes in a binary tree where each node has a pointer to its parent.

Key Idea:

  • Since each node has a pointer to its parent, we can trace back from each node to the root of the tree.
  • The first point where both nodes meet is their lowest common ancestor (LCA).
  • We use two pointers, one for each node (p and q), and move them up to their parent nodes. If a pointer reaches the root (i.e., None), we switch it to the other node. This ensures that both pointers traverse the same number of steps by the time they meet.

Approach:

  1. Two Pointers:
    • We initialize two pointers, a starting at p and b starting at q.
    • Both pointers move upwards to their respective parents at each step.
  2. Switch Pointers:
    • If one pointer reaches None (the root), switch it to the other node.
    • This ensures that both pointers traverse the same number of steps, as one pointer may reach the root earlier than the other.
  3. Meeting Point:
    • Eventually, both pointers will either meet at the lowest common ancestor or at None (in case the nodes don't have a common ancestor, but in a connected tree, they will meet at LCA).
  4. End Condition:
    • The loop continues until both pointers point to the same node (a == b), which will be the LCA.

Code:

class Node:
    def __init__(self, val=0, parent=None):
        self.val = val
        self.parent = parent

def lowest_common_ancestor(p, q):
    a, b = p, q
    while a != b:
        a = a.parent if a else q
        b = b.parent if b else p
    return a

Example Usage:

# Example tree setup:
#       3
#      / \
#     5   1
#    / \
#   6   2
#      / \
#     7   4

root = Node(3)
node5 = Node(5, root)
node1 = Node(1, root)
node6 = Node(6, node5)
node2 = Node(2, node5)
node7 = Node(7, node2)
node4 = Node(4, node2)

# Find LCA of node 6 and node 4
lca = lowest_common_ancestor(node6, node4)
print(lca.val)  # Output: 5

# Find LCA of node 7 and node 4
lca = lowest_common_ancestor(node7, node4)
print(lca.val)  # Output: 2

Explanation of Test Cases:

  1. Test Case 1:
    • Input: node6 and node4
    • Output: 5
    • Explanation: The LCA of node 6 and node 4 is node 5.
  2. Test Case 2:
    • Input: node7 and node4
    • Output: 2
    • Explanation: The LCA of node 7 and node 4 is node 2.

Time Complexity:

  • O(h), where h is the height of the binary tree.
    • In the worst case, we traverse the height of the tree twice (once for each pointer), but switching ensures both pointers meet in a finite number of steps.

Space Complexity:

  • O(1), as the solution uses a constant amount of space, regardless of the input size, except for the space needed for the input nodes.


125. Valid Palindrome

Problem: Given a string, determine if it is a palindrome, considering only alphanumeric characters and ignoring cases.

Explanation of the Approach:

The problem requires determining if a given string is a valid palindrome, considering only alphanumeric characters and ignoring case differences. A palindrome is a string that reads the same forward and backward.

Key Steps:

  1. Two-Pointer Approach:
    • Use two pointers: left starting from the beginning of the string and right starting from the end. These pointers will move towards the center of the string.
  2. Skip Non-Alphanumeric Characters:
    • Since only alphanumeric characters are considered, if a character at left or right is not alphanumeric (checked using isalnum()), move the pointer until it points to a valid alphanumeric character.
  3. Case-Insensitive Comparison:
    • Convert characters to lowercase using lower() before comparing them to ignore case differences.
  4. Continue Until Pointers Meet:
    • If all characters match, the string is a palindrome. If at any point s[left].lower() != s[right].lower(), the string is not a palindrome, and we return False.
  5. End Condition:
    • If the loop completes and no mismatches are found, return True to indicate the string is a palindrome.

Python Code:

def is_palindrome(s):
    left, right = 0, len(s) - 1
    while left < right:
        # Skip non-alphanumeric characters from the left
        while left < right and not s[left].isalnum():
            left += 1
        # Skip non-alphanumeric characters from the right
        while left < right and not s[right].isalnum():
            right -= 1
        # Compare the alphanumeric characters case-insensitively
        if s[left].lower() != s[right].lower():
            return False
        # Move pointers inward
        left, right = left + 1, right - 1
    return True

Example Usage:

# Example 1:
s1 = "A man, a plan, a canal: Panama"
print(is_palindrome(s1))  # Output: True

# Example 2:
s2 = "race a car"
print(is_palindrome(s2))  # Output: False

# Example 3:
s3 = " "
print(is_palindrome(s3))  # Output: True

Explanation of Test Cases:

  1. Test Case 1:
    • Input: "A man, a plan, a canal: Panama"
    • Output: True
    • Explanation: After ignoring non-alphanumeric characters and case differences, the string becomes "amanaplanacanalpanama", which is a palindrome.
  2. Test Case 2:
    • Input: "race a car"
    • Output: False
    • Explanation: After ignoring non-alphanumeric characters and case differences, the string becomes "raceacar", which is not a palindrome.
  3. Test Case 3:
    • Input: " "
    • Output: True
    • Explanation: An empty string or a string with only spaces is trivially a palindrome.

Time Complexity:

  • O(n), where n is the length of the string. We traverse the string with two pointers, each pointer moving once through the string.

Space Complexity:

  • O(1), since we use only a few extra variables for the two pointers and comparisons, without any additional space proportional to the input size.


15. 3Sum

Problem: Given an array nums of n integers, find all unique triplets in the array which gives the sum of zero.

Explanation of the Approach:

The problem requires finding all unique triplets in an array that sum to zero. The triplet (a, b, c) is considered unique if no other triplet contains the same set of numbers. The approach used here is efficient, combining sorting with a two-pointer technique to find such triplets.

Key Steps:

  1. Sort the Array:
    • Sorting the array helps in avoiding duplicates and simplifies the process of finding the required triplets.
  2. Iterate with a Fixed Element:
    • For each element in the array nums[i], we fix this element and then use a two-pointer approach to find the remaining two elements (nums[left] and nums[right]) such that their sum with nums[i] equals zero.
  3. Two-Pointer Technique:
    • After fixing nums[i], the problem reduces to finding two numbers in the sorted array (from i+1 to end) whose sum equals -nums[i]. This can be done using two pointers:
      • left pointer starts right after i (i + 1).
      • right pointer starts at the end of the array.
      • Move left and right pointers inward based on the sum of the three numbers.
  4. Avoid Duplicates:
    • To ensure the uniqueness of triplets, skip duplicates of nums[i], nums[left], and nums[right] by moving the pointers past consecutive duplicate values.
  5. Termination:
    • The loop continues until i reaches the third-to-last element, and the two-pointer search is carried out for each element.

Python Code:

def three_sum(nums):
    nums.sort()  # Step 1: Sort the array
    result = []

    for i in range(len(nums) - 2):  # Step 2: Iterate with fixed element
        if i > 0 and nums[i] == nums[i - 1]:  # Skip duplicates for i
            continue

        left, right = i + 1, len(nums) - 1  # Two-pointer approach
        while left < right:
            total = nums[i] + nums[left] + nums[right]
            if total < 0:
                left += 1  # Move left pointer right if the sum is less than zero
            elif total > 0:
                right -= 1  # Move right pointer left if the sum is greater than zero
            else:
                result.append([nums[i], nums[left], nums[right]])  # Found a triplet

                # Skip duplicates for left
                while left < right and nums[left] == nums[left + 1]:
                    left += 1
                # Skip duplicates for right
                while left < right and nums[right] == nums[right - 1]:
                    right -= 1

                left += 1
                right -= 1  # Move both pointers inward to find new triplets

    return result

Example Usage:

# Example 1:
nums1 = [-1, 0, 1, 2, -1, -4]
print(three_sum(nums1))  
# Output: [[-1, -1, 2], [-1, 0, 1]]

# Example 2:
nums2 = [0, 1, 1]
print(three_sum(nums2))  
# Output: []

# Example 3:
nums3 = [0, 0, 0]
print(three_sum(nums3))  
# Output: [[0, 0, 0]]

Explanation of Test Cases:

  1. Test Case 1:
    • Input: [-1, 0, 1, 2, -1, -4]
    • Output: [[-1, -1, 2], [-1, 0, 1]]
    • Explanation: The two triplets that sum to zero are [-1, -1, 2] and [-1, 0, 1].
  2. Test Case 2:
    • Input: [0, 1, 1]
    • Output: []
    • Explanation: There is no triplet that sums to zero in this array.
  3. Test Case 3:
    • Input: [0, 0, 0]
    • Output: [[0, 0, 0]]
    • Explanation: The triplet [0, 0, 0] sums to zero and is the only valid triplet.

Time Complexity:

  • O(n^2): Sorting the array takes O(n log n), and then for each element, the two-pointer search takes O(n). Thus, the total time complexity is O(n^2).

Space Complexity:

  • O(1) (excluding the output space): We only use a few extra variables for the pointers and result storage, making the space complexity constant apart from the input and output.


31. Next Permutation

Problem: Implement the next permutation, which rearranges numbers into the lexicographically next greater permutation of numbers.

Explanation of the Approach:

The goal of the Next Permutation problem is to rearrange the given numbers into the next lexicographically larger permutation. If no such permutation exists (i.e., the array is in descending order), the array should be rearranged as the smallest possible permutation (i.e., sorted in ascending order).

Key Observations:

  1. Lexicographical Order:
    • The next permutation is the smallest permutation that is lexicographically larger than the current one. If no such permutation exists, we return the smallest permutation by sorting.
  2. Reverse Sorted Suffix:
    • The key insight is that the next permutation must change the smallest possible portion of the array to get the next largest number. This portion will be the longest suffix that is in decreasing order.
  3. Steps to Solve:
    • Find the first decreasing element (from the end): Starting from the end of the array, find the first element nums[i] that is smaller than the element after it (nums[i + 1]). This element marks the point where the next larger permutation can be generated.
    • Find the next largest element: From the right end of the array, find the smallest element nums[j] that is larger than nums[i] and swap them. This ensures that we get the smallest possible number that is larger than nums[i].
    • Reverse the suffix: After swapping, reverse the portion of the array after index i. This ensures that the suffix is sorted in ascending order, which gives us the smallest possible lexicographical order.

Python Code:

def next_permutation(nums):
    # Step 1: Find the first decreasing element from the end
    i = len(nums) - 2
    while i >= 0 and nums[i] >= nums[i + 1]:
        i -= 1

    # Step 2: If we found a decreasing element, swap it with the next largest element
    if i >= 0:
        j = len(nums) - 1
        while nums[j] <= nums[i]:
            j -= 1
        nums[i], nums[j] = nums[j], nums[i]

    # Step 3: Reverse the part of the array after the swapped element
    nums[i + 1:] = reversed(nums[i + 1:])

Step-by-Step Example:

Example 1:

  • Input: nums = [1, 2, 3]
  • Step 1: Find the first decreasing element from the end. Here, nums[1] = 2 is smaller than nums[2] = 3, so i = 1.
  • Step 2: Find the smallest element larger than nums[1]. Here, nums[2] = 3 is the smallest element larger than nums[1], so we swap nums[1] and nums[2].
    • Array after swap: [1, 3, 2]
  • Step 3: Reverse the part of the array after index 1. In this case, only one element remains, so no actual change is made.
    • Final result: [1, 3, 2]

Example 2:

  • Input: nums = [3, 2, 1]
  • Step 1: Find the first decreasing element. There is none, because the array is in descending order.
  • Step 2: Since no such element exists, skip the swap step.
  • Step 3: Reverse the entire array to get the smallest permutation.
    • Final result: [1, 2, 3]

Example 3:

  • Input: nums = [1, 1, 5]
  • Step 1: Find the first decreasing element from the end. Here, nums[1] = 1 is smaller than nums[2] = 5, so i = 1.
  • Step 2: Find the smallest element larger than nums[1]. Here, nums[2] = 5 is the smallest element larger than nums[1], so we swap nums[1] and nums[2].
    • Array after swap: [1, 5, 1]
  • Step 3: Reverse the part of the array after index 1. In this case, only one element remains, so no actual change is made.
    • Final result: [1, 5, 1]

Time Complexity:

  • O(n), where n is the length of the array. The process involves:
    • A single pass to find the first decreasing element (O(n)).
    • A single pass to find the next largest element (O(n)).
    • A reverse operation on the suffix, which is also at most O(n).

Space Complexity:

  • O(1), as the rearrangement is done in place with constant extra space.

This solution efficiently finds the next lexicographically greater permutation by modifying the input array in-place.



283. Move Zeroes

Problem: Given an array nums, move all 0's to the end while maintaining the relative order of the non-zero elements.

Explanation of the Approach:

The problem asks to move all zeros in the array to the end, while maintaining the relative order of the non-zero elements. The idea is to rearrange the array in-place with minimal operations.

Key Observations:

  1. Two Pointers:
    • Use two pointers (left and right) to keep track of the position where the next non-zero element should be placed.
    • left tracks the position where a non-zero element should go, and right iterates over the array.
  2. Swap Non-zero Elements:
    • As right traverses the array, whenever a non-zero element is encountered, swap it with the element at the left index.
    • After the swap, increment the left pointer to point to the next position where a non-zero element should be placed.
  3. Zeros are Naturally Moved to the End:
    • By the end of the process, all non-zero elements will be shifted to the front, and all zeros will be moved to the end.

Python Code:

def move_zeroes(nums):
    left = 0  # Pointer to place the next non-zero element
    for right in range(len(nums)):  # Traverse the array with right pointer
        if nums[right] != 0:  # If current element is non-zero
            # Swap the non-zero element with the left pointer's element
            nums[left], nums[right] = nums[right], nums[left]
            left += 1  # Move the left pointer forward

Explanation of the Code:

  1. Initialize left:
    • The left pointer keeps track of where the next non-zero element should be placed. Initially, it is set to 0.
  2. Iterate with right:
    • The right pointer traverses the entire array. For each non-zero element found at nums[right], we swap it with the element at nums[left].
  3. Swap Non-zero Elements:
    • If nums[right] is non-zero, swap it with nums[left] to bring the non-zero element to the front. After the swap, move left to the next position.
  4. End of Iteration:
    • By the end of the loop, all non-zero elements will be at the front of the array, and the zeros will be pushed to the end, as they were swapped with elements in the front.

Example Usage:

# Example 1:
nums1 = [0, 1, 0, 3, 12]
move_zeroes(nums1)
print(nums1)  # Output: [1, 3, 12, 0, 0]

# Example 2:
nums2 = [0, 0, 1]
move_zeroes(nums2)
print(nums2)  # Output: [1, 0, 0]

Explanation of Test Cases:

  1. Test Case 1:
    • Input: [0, 1, 0, 3, 12]
    • Output: [1, 3, 12, 0, 0]
    • Explanation: The zeros are moved to the end while the relative order of 1, 3, and 12 is maintained.
  2. Test Case 2:
    • Input: [0, 0, 1]
    • Output: [1, 0, 0]
    • Explanation: The single non-zero element 1 is moved to the front, and the zeros are moved to the end.

Time Complexity:

  • O(n), where n is the length of the array. We traverse the array once with the right pointer and perform constant-time swaps.

Space Complexity:

  • O(1), since the rearrangement is done in place and no extra space is used apart from a few variables for tracking indices.

This approach efficiently solves the problem by moving zeros to the end of the array while maintaining the order of the non-zero elements in a single pass.



1570. Dot Product of Two Sparse Vectors

Problem: Given two sparse vectors, compute their dot product. Implement the SparseVector class:

  • SparseVector(nums) initializes the object with the vector nums.
  • dotProduct(vec) computes the dot product with another SparseVector.

Explanation of the Approach:

The problem asks to compute the dot product of two sparse vectors. In a sparse vector, most elements are zero, so iterating through every element in the vector (as we would in a dense vector) would be inefficient. Instead, we can optimize this by storing only the non-zero elements and their indices.

Key Observations:

  1. Sparse Vectors:
    • Sparse vectors contain a large number of zeros, so it’s wasteful to store all elements.
    • We can store only the non-zero elements along with their indices in the form of pairs (index, value).
  2. Dot Product:
    • The dot product of two vectors is the sum of the products of their corresponding elements.
    • For sparse vectors, we only need to consider the indices where both vectors have non-zero values.
  3. Two-Pointer Approach:
    • Since we are storing only the non-zero elements, we can use a two-pointer technique to traverse through both vectors efficiently.
    • Both pointers start from the beginning of the non-zero pairs, and we move the pointers based on the comparison of indices:
      • If the indices are equal, compute the product of the values and move both pointers forward.
      • If one index is smaller, move the corresponding pointer forward.

Python Code:

class SparseVector:
    def __init__(self, nums):
        # Store pairs of (index, value) where value is non-zero
        self.pairs = [(i, num) for i, num in enumerate(nums) if num != 0]

    def dotProduct(self, vec):
        result = 0
        p1, p2 = 0, 0

        # Two-pointer approach
        while p1 < len(self.pairs) and p2 < len(vec.pairs):
            index1, value1 = self.pairs[p1]
            index2, value2 = vec.pairs[p2]

            if index1 == index2:
                result += value1 * value2  # Multiply when indices are the same
                p1 += 1
                p2 += 1
            elif index1 < index2:
                p1 += 1  # Move pointer in the first vector
            else:
                p2 += 1  # Move pointer in the second vector

        return result

Explanation of Code:

  1. Initialization:
    • In the constructor __init__, we store the non-zero elements and their indices in a list called pairs. This is done using a list comprehension that iterates through the input nums and includes only non-zero values with their corresponding indices.
  2. dotProduct Method:
    • In the dotProduct method, we use two pointers p1 and p2 to iterate through the pairs list of both vectors.
    • If the indices at the two pointers match, we calculate the product of the values and add it to the result.
    • If the indices don't match, we move the pointer with the smaller index forward.
    • The loop continues until we exhaust one of the lists of non-zero pairs.

Example Usage:

# Example 1:
nums1 = [1, 0, 0, 2, 3]
nums2 = [0, 3, 0, 4, 0]

v1 = SparseVector(nums1)
v2 = SparseVector(nums2)
print(v1.dotProduct(v2))  # Output: 8 (2*4)

# Example 2:
nums3 = [0, 1, 0, 0, 2, 0, 0]
nums4 = [1, 0, 0, 0, 3, 0, 4]

v3 = SparseVector(nums3)
v4 = SparseVector(nums4)
print(v3.dotProduct(v4))  # Output: 6 (1*0 + 2*3 = 6)

Explanation of Test Cases:

  1. Test Case 1:
    • nums1 = [1, 0, 0, 2, 3], nums2 = [0, 3, 0, 4, 0]
    • The only common index with non-zero values is at index 3, where 2 * 4 = 8. Thus, the dot product is 8.
  2. Test Case 2:
    • nums3 = [0, 1, 0, 0, 2, 0, 0], nums4 = [1, 0, 0, 0, 3, 0, 4]
    • The common non-zero indices are at index 4, where 2 * 3 = 6. Thus, the dot product is 6.

Time Complexity:

  • O(n1 + n2), where n1 and n2 are the number of non-zero elements in nums1 and nums2, respectively. This is because we traverse through only the non-zero elements using two pointers, so the complexity is proportional to the number of non-zero elements, not the length of the input arrays.

Space Complexity:

  • O(n1 + n2) for storing the non-zero elements of both vectors. The storage is proportional to the number of non-zero elements in both vectors, rather than the size of the full arrays.

Explanation:

  1. Initialization:
    • The SparseVector constructor processes the input list nums and stores non-zero elements along with their indices in the pairs list.
  2. Dot Product:
    • Use two pointers, p1 and p2, to iterate through the non-zero elements of both vectors.
    • When indices match, multiply the corresponding values and add to the result.
    • If indices do not match, advance the pointer with the smaller index.
    • Continue until one of the pointers exceeds the length of its list.


Summary of Two Pointers Technique

The Two Pointers technique is highly versatile and effective for solving a wide range of problems, including:

  • String and array manipulation (valid palindromes, word abbreviations, merging arrays).
  • Graph and tree traversal (lowest common ancestor).
  • Combinatorial problems (3Sum).
  • Sequence generation and modification (next permutation).
  • Array partitioning (move zeroes).
  • Efficient computations with sparse data structures (dot product of sparse vectors).

By maintaining two pointers that traverse the data structure in a strategic manner, you can often achieve significant performance improvements over more naive approaches.

'ML Engineering > python' 카테고리의 다른 글

03. Binary Tree DFS Techniques  (0) 2024.08.06
02. Sliding Window Technique  (0) 2024.08.06
Differences between subarrays, substrings, subsequences, and subsets  (0) 2024.08.06
Python 2 vs 3 Difference  (0) 2024.04.12
Python - Data types  (0) 2024.04.05

 

  • Subarray: Contiguous segment of an array.
    • [1, 2, 3] in [0, 1, 2, 3, 4]
  • Substring: Contiguous segment of a string.
    • "ell" in "hello"
  • Subsequence: Sequence derived by deleting some or no elements without changing the order.
    • [1, 3] in [1, 2, 3, 4]
    • "hlo" in "hello"
  • Subset: Any combination of elements from a set, order does not matter.
    • {1, 3} in {1, 2, 3}
    • {}, {1, 2}, {3, 2} in {1, 2, 3}

1. Subarray

Definition: A subarray is a contiguous segment of an array. It includes elements that are consecutive in the original array.

Characteristics:

  • Must be contiguous.
  • Order of elements must be preserved.
  • Length can range from 0 to the length of the array.

Example:

  • Original Array: [1, 2, 3, 4]
  • Possible Subarrays: [1, 2], [2, 3, 4], [3], [1, 2, 3, 4], [] (empty subarray)

2. Substring

Definition: A substring is a contiguous sequence of characters within a string. Like subarrays, substrings consist of consecutive characters.

Characteristics:

  • Must be contiguous.
  • Order of characters must be preserved.
  • Length can range from 0 to the length of the string.

Example:

  • Original String: "hello"
  • Possible Substrings: "he", "ell", "hello", "o", "" (empty substring)

3. Subsequence

Definition: A subsequence is a sequence derived from another sequence (array or string) by deleting some or no elements without changing the order of the remaining elements.

Characteristics:

  • Does not need to be contiguous.
  • Order of elements/characters must be preserved.
  • Length can range from 0 to the length of the sequence.

Example:

  • Original Array: [1, 2, 3, 4]
  • Possible Subsequences: [1, 3], [2, 4], [1, 2, 3, 4], [] (empty subsequence)
  • Original String: "hello"
  • Possible Subsequences: "hlo", "el", "hello", "" (empty subsequence)

4. Subset

Definition: A subset is any combination of elements from a set, where the order does not matter. Subsets include all possible combinations, regardless of order or contiguity.

Characteristics:

  • Does not need to be contiguous.
  • Order of elements/characters does not matter.
  • Length can range from 0 to the length of the set.
  • Includes all possible combinations of the elements.

Example:

  • Original Set: {1, 2, 3}
  • Possible Subsets: {}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}

 

'ML Engineering > python' 카테고리의 다른 글

03. Binary Tree DFS Techniques  (0) 2024.08.06
02. Sliding Window Technique  (0) 2024.08.06
01. Two Pointers Technique  (0) 2024.08.06
Python 2 vs 3 Difference  (0) 2024.04.12
Python - Data types  (0) 2024.04.05

 

Here are some key differences between Python 2 and Python 3:

 

 

Division: In Python 2, integer division always results in an integer. In Python 3, integer division results in a float if the result is not an integer.

print(7 / 5 )
print(-7 / 5)	 
''' 
Output in Python 2.x 
1 
-2 

Output in Python 3.x : 
1.4 
-1.4 
'''

 

Annotations: Python 3 supports type annotations, which can help with code clarity and maintainability. Python 2 does not support type annotations.

 

Exceptions: In Python 2, exceptions are enclosed in notations. In Python 3, exceptions are enclosed in parentheses.

 

 

Storage of Strings: In Python 2, strings are stored as ASCII by default, while in Python 3, strings are stored as Unicode by default. This means that in Python 3, you can use non-ASCII characters in your strings without having to explicitly encode them. 

print(type('default string ')) 
print(type(u'string with b ')) 
''' 
Output in Python 2.x (Unicode and str are different) 
<type 'str'> 
<type 'unicode'> 

Output in Python 3.x (Unicode and str are same) 
<class 'str'> 
<class 'str'> 
'''

 

 

Print statement: In Python 2, the print statement is used to print output to the console. In Python 3, the print function is used instead. The print function is more flexible than the print statement, and it allows you to print multiple values on the same line, for example. 

 

Range function: In Python 2, the xrange() function is used to generate a sequence of numbers. In Python 3, the range() function is used instead. The range() function is more efficient than the xrange() function, and it can be used to generate infinite sequences. 

 

 

 

'ML Engineering > python' 카테고리의 다른 글

03. Binary Tree DFS Techniques  (0) 2024.08.06
02. Sliding Window Technique  (0) 2024.08.06
01. Two Pointers Technique  (0) 2024.08.06
Differences between subarrays, substrings, subsequences, and subsets  (0) 2024.08.06
Python - Data types  (0) 2024.04.05

String

  • Sequence of characters, enclosed –“ ”/’ ’/””” “””, ordered, immutable, duplicates allowed
  • Accessing Items: Indexing ,slicing and for loop
  • Functions: capitalize(), title(), upper(), lower(), lstrip() ,rstrip() ,strip(), swapcase(), replace(), find(), split(), join() ,endswith(), startswith(), del, index(), count(), isalnum(), isalpha(), isdecimal(), isdigit(), isnumeric(), islower() ,isupper(), istitle(), isspace(), zfill(),center().

 

List

  • collection of heterogeneous datatype, enclosed –[ ], ordered, mutable duplicates allowed
  • Accessing Items: Indexing ,slicing and for loop
  • Functions: append(), extend(), insert(), remove(), pop(), clear(), del, sort(), reverse(), index(), count()

 

Tuple

  • collection of heterogeneous datatype, enclosed –( ), ordered, immutable, duplicates allow
  • Accessing Items: Indexing ,slicing and for loop
  • Functions: Index(), count()

 

Set

  • collection of immutable items, enclosed in –{ }.,unordered, mutable, duplicates not allows
  • Accessing Items:for loop
  • Functions:remove(), discard(), pop(), clear(), del, add(), update(), union(), intersection(), intersection_update, difference() ,difference_update() ,symmetric_difference, symmetric_difference_update(), issubset(), issuperset(), isdisjoint(),

Frozen set

  • collection of immutable items, enclosed in –{()}.,unordered, immutable, duplicates not allow.
  • Accessing Items:for loop
  • Functions: Union(), intersection(), difference(), symmetric_difference.

 

Dictionary

  • collection of key and value pairs, enclosed in –{ }., ordered, mutable, duplicates allows Keys:immutable,Values:mutables
  • Accessing Items: Indexing ,slicing and for loop
  • Functions: Dict_name[keyname], update(), pop(), popitems(), clear(), del, fromkeys(),setdefault().

 

Bool

  • True and False

 

 

 

 

Kubernetes architecture is built to manage the lifecycle and operations of containerized applications across diverse infrastructure. Kubernetes not only simplifies cloud application deployment but also offers a robust framework for operational excellence in the cloud era. 

 

Kubernetes, at its core, orchestrates containerized applications across a cluster of machines. This orchestration allows for high availability, fault tolerance, and scalability in deploying applications. Let’s dissect the key components that make Kubernetes an essential tool for managing distributed systems.

 

This document outlines the various components you need to have for a complete and working Kubernetes cluster.

The Foundation: Clusters and Nodes

A Kubernetes Cluster is a set of nodes that run containerized applications. These nodes are the workhorses of a Kubernetes cluster, where:

  • Worker Nodes host the Pods—the smallest deployable units that can be created and managed in Kubernetes.
  • Control Plane Nodes manage the state of the cluster, including scheduling applications, maintaining applications' desired state, scaling applications, and rolling out new updates.

Understanding these components is crucial for anyone looking to specialize in cloud computing or distributed systems.

Control Plane Components: The Brain Behind the Operation

The control plane is responsible for making global decisions about the cluster and reacting to cluster events. Its components include:

  • kube-apiserver: This acts as the front end to the cluster’s control plane. It exposes the Kubernetes API, which is the primary management interface of Kubernetes. The kube-apiserver is designed to scale horizontally, ensuring Kubernetes can manage and scale applications efficiently.
  • etcd: A highly available key-value store used for all cluster data, ensuring there's a reliable source of truth for the state of the cluster. Proper management of etcd is crucial for cluster health and data integrity.
  • kube-scheduler: Responsible for assigning newly created Pods to Nodes, taking into account the operational requirements of the workloads and the current infrastructure's state.
  • kube-controller-manager: Runs controller processes that monitor the state of the cluster through the apiserver and make changes aiming to move the current state towards the desired state.
  • cloud-controller-manager: Integrates the cluster into the cloud provider's API, allowing Kubernetes to interact with the underlying infrastructure when necessary.

These components ensure the cluster is behaving as intended, handling scheduling, and interacting with underlying infrastructure.

Node Components: The Muscle Doing the Heavy Lifting

Each node in a cluster runs several mandatory components:

  • kubelet: Ensures that containers are running in a Pod. It starts, stops, and maintains application containers organized into Pods as directed by the control plane.
  • kube-proxy: Maintains network rules on nodes, allowing communication to your Pods from network sessions inside or outside of your cluster.
  • Container Runtime: The software responsible for running containers. Kubernetes supports several container runtimes, like containerd and CRI-O.

These components are essential for the actual running of applications on the physical or virtual machines within the cluster.

 

'ML Engineering > Kubernetes' 카테고리의 다른 글

Kubernetes - Overview I  (0) 2024.04.05

 

Kubernetes?

Kubernetes는 컨테이너화된 애플리케이션의 배포, 확장, 관리를 자동화하기 위해 설계된 오픈 소스 플랫폼입니다. declarative configuration and automation를 통해 대규모 애플리케이션 관리를 용이하게 합니다. Kubernetes는 그리스어에서 '키잡이' 또는 '조타수'를 의미하며, Google이 2014년에 오픈 소스로 공개했습니다. 이는 Google의 대규모 운영 경험과 커뮤니티의 아이디어를 결합한 것입니다.


The Evolution of Deployment

Traditional deployment era

초기에는 애플리케이션을 물리 서버에 직접 배포했으나, 리소스 할당 문제가 발생했습니다. 애플리케이션마다 물리 서버를 사용하는 것은 확장성이 떨어지고 비용이 많이 드는 방법이었습니다.

Virtualized deployment era

가상화는 물리 서버의 CPU 위에 여러 가상 머신(VM)을 실행할 수 있게 하여 리소스 활용도를 개선했습니다. 각 VM은 자체 OS를 포함하여 애플리케이션 간 격리를 제공합니다.

Container deployment era

컨테이너는 VM과 유사하지만, OS를 애플리케이션 간에 공유하여 더 가볍게 만들었습니다. 컨테이너는 이동성, 민첩한 개발 및 배포, 환경 일관성 등의 이점으로 인해 인기를 끌고 있습니다.

 


Kubernetes 의 필요성

프로덕션 환경에서 컨테이너를 관리하려면 높은 가용성을 보장하고 수요에 따라 확장해야 합니다. Kubernetes는 분산 시스템을 탄력적으로 운영할 수 있는 프레임워크를 제공합니다.

Kubernetes의 주요 기능

  • Service discovery and load balancing: DNS 이름이나 IP 주소를 사용하여 컨테이너를 노출하고 네트워크 트래픽을 분산시킵니다.
  • Storage orchestration: 자동으로 스토리지 시스템을 마운트합니다.
  • Automated rollouts and rollbacks: 원하는 상태를 선언적으로 관리하며 변경사항을 자동으로 적용합니다.
  • Self-healing: 실패한 컨테이너를 재시작하고, 문제가 있는 컨테이너를 교체합니다.
  • Secret and configuration management: 비밀 정보를 안전하게 저장하고 관리합니다.
  • Horizontal scaling: CPU 사용량에 기반해 애플리케이션을 자동으로 확장하거나 축소합니다.

'ML Engineering > Kubernetes' 카테고리의 다른 글

Kubernetes - Overview II - Components  (0) 2024.04.05

There are generally three ways to perform text-only adaptation:

 

  • Injecting synthesizing speech data to the model
    • generate audio for training texts via TTS and inject it to the model
  • LM fusion
    • Fusion and biasing (shallow fusion):
      • during decoding interpolate posterior word probabilities with text priors from external LMs
      • another recent approach is to extract internal LM probabilities and discount with the ratio of external and internal LM probabilities
    • Rescoring and reranking
      • after decoding, use a powerful external LM to update scores and rerank n-best results or recognition lattice
    • These techniques incur a significant overhead at inference time due to the external LM and also require careful tuning of the interpolation weight used for the external LM.
  • Explicit separation of internal LMs
    • force the E2E decoder/predictor to behave more like a language model (e.g. Hybrid autoregressive transducer (HAT), Modular hybrid autoregressive transducer, and Factorized transducer)

 

Reference

[1] External Language Model Integration for Factorized Neural Transducers

[2] in-situ test-only adaptation of speech models with low-overhead speech imputations

 

 

This paper presents a method for jointly pre-training speech and text in an encoder-decoder framework to improve performance in speech translation and recognition tasks. 

 

 

Key Takeaways:

  1. Architecture: The method utilizes an Attention based Encoder-Decoder (AED) framework to integrate data from different modalities (speech and text) for representation learning.
    • Shared Encoder and Decoder: The STPT framework uses a shared encoder and decoder for both the speech and text modalities, which allows the model to integrate knowledge from both domains.
  2. Acoustic and Linguistic Representation Learning: The STPT framework is designed to learn both acoustic features from speech and linguistic features from text during the pre-training stage. This is crucial for speech translation models, which must understand the sounds of speech as well as the meaning of words.
  3. Joint Pre-Training Phase; Multi-Task Learning Framework: The framework integrates different pre-training tasks to build a robust model capable of handling multiple aspects of speech and language. The proposed Speech and Text joint Pre-Training (STPT) framework incorporates four self-supervised and supervised subtasks designed for cross-modality learning.
    • Text-to-Text (T2T): This self-supervised task helps the model learn linguistic patterns in the text. It's similar to how models like BERT learn by predicting masked words in a sentence.
    • Speech SSL learning (SSL): This is another self-supervised task focused on learning from the speech data alone, likely involving predicting some masked or hidden parts of the speech input.
    • Speech-to-Phoneme (S2P): A supervised task where the model is trained to predict phoneme units from speech data. Phonemes are the smallest units of sound in a language, so this task helps the model learn the sounds that make up speech.
    • Speech-to-Subword (S2T): Also a supervised task, where the model learns to predict subword units from the speech input. Subwords are larger than phonemes and can carry more linguistic information, like syllables or parts of words.
  4. Loss Functions: Pretraining is guided by different loss functions corresponding to the various tasks:
    • LT2T: The loss for the Text-to-Text task.
    • LSSL: The loss for the Speech SSL learning task, which involves masked prediction.
    • LS2P: The loss for the Speech-to-Phoneme task, which involves phoneme-unit sequence classification.
    • LS2T: The loss for the Speech-to-Subword task, involving sequential prediction of subword tokens.
    • Final Loss: The overall objective for the pre-training phase is a combination of these losses, guiding the model to learn both modality-specific and cross-modal representations. 
  5. Improved Performance: The STPT method effectively fuses speech and text information into one model, leading to significant improvements in performance. It achieves 1.7 to 2.3 BLEU score improvements on the MUST-C speech translation dataset and comparable word error rates (WERs) to the wav2vec 2.0 model on the LibriSpeech speech recognition task.

 

 

+ Recent posts