BEST-RQ: SSL with Random-projection Quantizer for Speech Recognition

BEST-RQ introduces a novel technique of self-supervised training using a combination of Random Projection Quantizer (RPQ) and Masked Language Modeling (MLM).

Entire process of BEST-RQ is firstly to proceed Random Projection Quantizer (RPQ) (randomly initialized linear layer and a single codebook for quantizing and discretizing the audio):

  • The Mel filterbanks are projected through the linear layer.
  • The index of the nearest codebook entry to the projection is selected as the target.
  • The nearest codebook entry is found by calculating the argmin of the normalized distance between the projection and each codebook entry.

Afterward, a mask is applied to a portion of the Mel filterbanks, and the model’s objective is to predict the correct targets for the masked sections. This is framed as a classification task, and cross-entropy loss is used to compute the training objective.


1. Random Projection Quantizer (RPQ)

The Random Projection Quantizer is the core part in BEST-RQ, designed to discretize continuous speech features, making them suitable for BERT-like pretraining. RPQ consists of two major components: the Projection Matrix and the Codebook. Both are randomly initialized and remain fixed throughout the training process.

1) Projection Matrix

The projection matrix projects the original speech features into a lower-dimensional space. The matrix is of size ( d \times k ), where:

  • d: Dimensionality of the original speech features (typically high, such as hundreds or thousands).
  • k: Target dimensionality after projection (usually much lower than ( d )).

This dimensionality reduction is essential for handling the vast amount of speech data efficiently.

2) Codebook

The Codebook is a collection of n code vectors, each of size ( k ). These code vectors represent the discrete code space into which the speech features are projected.

  • n: The size of the codebook, which can be tuned based on the task at hand.

Given an input vector ( x ) (a ( d )-dimensional vector computed from speech signals), RPQ maps ( x ) to discrete labels ( y ) through the following operation:

Where:

  • The projection matrix ( A ) is a randomly initialized ( h \times d ) matrix.
  • The codebook ( C = {c_1, ..., c_n} ) contains randomly initialized ( h )-dimensional vectors.
  • The function ( \text{norm}_{l2} ) denotes the L2 normalization.

This transformation enables the speech signals to be quantized into discrete labels, providing a structured learning signal for the downstream tasks.

The projection matrix is initialized using Xavier initialization (Glorot & Bengio, 2010).
The codebook is initialized using a standard normal distribution.
Both are kept frozen during the entire pretraining process, ensuring that the quantization remains consistent.

 

Code

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.linalg import vector_norm

class RandomProjectionQuantizer(nn.Module):
    """
    Vector quantization using a projection and a randomly initialized codebook.
    The output is the indices of the closest code in the codebook for each time step of the input.

    Example
    -------
    >>> quantizer = RandomProjectionQuantizer(16, 16, 8192)
    >>> inputs = torch.rand(10, 12, 16)
    >>> output = quantizer(inputs)
    >>> output.shape
    torch.Size([10, 12])
    """

    def __init__(self, input_dim, codebook_dim, codebook_vocab):
        super().__init__()

        self.input_dim = input_dim
        self.codebook_dim = codebook_dim
        self.codebook_vocab = codebook_vocab

        # Initialize projection matrix with Xavier initialization
        self.Prj_A_init = nn.init.xavier_uniform_(torch.empty((input_dim, codebook_dim)))

        # Normalize a randomly initialized codebook
        self.codebook = F.normalize(torch.randn(codebook_vocab, codebook_dim))

    def forward(self, x):
        """
        Forward the input through the projection and return the indices of the closest codebook entries.
        """
        # Normalize the projected input
        x = F.normalize(torch.matmul(x, self.Prj_A_init))

        # Calculate distances between codebook entries and input, and find the closest code
        distances = vector_norm(self.codebook.unsqueeze(1) - x.unsqueeze(1), dim=-1)

        # Return the indices of the closest code for each input
        return distances.argmin(dim=1)

2. Masked Language Modeling (MLM)

BEST-RQ applies Masked Language Modeling (MLM), much like BERT does for text, but in this case for speech. During training, certain portions of the speech signal are masked and replaced with noise.

  • Masking Strategy: Each frame of speech is masked with a fixed probability, and the masked portions are replaced with noise sampled from a normal distribution (mean = 0, standard deviation = 0.1).

The model, typically based on a Transformer architecture, is then tasked with predicting the labels (codebook indices) of the masked speech based on the surrounding context. This allows the model to focus on learning robust speech representations.


** A unique point of BEST-RQ is that the RPQ's projection matrix and codebook are frozen and independent of the ASR encoder. This ensures that the model focuses solely on learning meaningful speech representations without needing to adapt to the intricacies of the quantization process.

 

Code

https://github.com/speechbrain/speechbrain/pull/2309/files#diff-a93bef3df2fb2e56565025e82dbc87ee2293c30872b211a91ea049fd6c3bb49d

Pre-train.
The pre-training uses mask length 400ms with masking probability of 0.01.
The learning rate schedule uses a transformer learning rate schedule (Vaswani et al., 2017).
Adam optimizer with 0.004 peak learning rate and 25000 warmup steps.
The batch size is 2048.
Since the encoder has 4 times temporal-dimension reduction, the quantization with random projections stacks every 4 frames for projections.
The vocab size of the codebook is 8192 and the dimension is 16.

The pre-training quality is not very sensitive to the codebook vocab size and the codebook dimension, and is more sensitive to the masking probability and the mask length. The role of the projection layer in the random-projection quantizer is to allow using different codebook dimensions, and one can achieve similar results without the projection and set the codebook dimension to be the same as the input dimension. Due to the variance coming from the random initialization, the impact of a hyperparameter usually requires multiple runs of experiments to verify the result.

 

Codebook utilization. One of the most critical factors for pre-training quality is the percentage of the codebook that is used during training. In particular, at each training step a higher percentage of the codebook being used in each batch correlates strongly with a good pre-training quality. When the distribution of the codebook utilization is skewed toward a smaller subset of codes, this usually makes the pre-training task easier and provides less effective pre-training. The l2 normalizations on the projected vector and the codebook are critical for providing more uniform codebook utilization. On the other hand, using randomly initialized codebook and projection matrix can introduce different codebook utilizations with different random seeds, which impact the pretraining quality across different runs with same experiment configurations. This variance impacts quality more when training with smaller pre-training and fine-tuning datasets. How to reduce this reproducibility issue caused by random initialization is an important next step for improving random-projection quantizations.

 

Initialization. The quantizer uses random initialization and does not update the parameters, and therefore the initialization algorithm can play an important role on the results. In this paper we showed results with Xavier initialization for the projection matrix and the standard normal distribution for the codebook, and further comparisons on different initialization algorithms can be conduct in the future work.

 

[1] https://arxiv.org/pdf/2202.01855

[2] https://arxiv.org/pdf/2405.04296

[3] Speechbrain

WebDataset and Lhotse are both tools designed to facilitate working with large-scale datasets, particularly in the context of machine learning with PyTorch.

In summary,

  • WebDataset for general-purpose, scalable data handling across multiple modalities
  • Lhotse for specialized speech and audio processing tasks where detailed data preparation is critical.

WebDataset

Overview:

  • Purpose: Primarily designed for streaming large datasets stored in tar archives in distributed training environments.
  • Data Format: Works with tar files containing various types of data, such as images, audio, and text.
  • Integration: Integrates directly with PyTorch’s DataLoader, making it easy to use in deep learning pipelines.
  • Key Features:
    • Streaming: Enables on-the-fly data loading from tar archives, reducing memory overhead.
    • Sharding: Supports data sharding across multiple GPUs or nodes, optimizing for distributed training.
    • Flexibility: Can handle multiple data types (images, audio, etc.) in a single archive.
    • Compression: Supports compression, which can save storage space and bandwidth during data loading.

Best For:

  • Large-scale, distributed training where data needs to be streamed from disk or cloud storage.
  • Projects requiring efficient handling of large datasets stored in tar archives.
  • Use cases where different types of data (e.g., images, audio, text) are stored together.

Lhotse

Overview:

  • Purpose: A toolkit specifically designed for preparing and managing large-scale speech and audio datasets, particularly for speech processing tasks.
  • Data Format: Works with various audio formats and annotations, supporting efficient data storage and access.
  • Integration: Also integrates with PyTorch, providing ready-to-use Dataset classes for speech recognition, speaker verification, and other audio tasks.
  • Key Features:
    • Data Preparation: Provides tools for preparing and managing datasets, including feature extraction, data augmentation, and metadata handling.
    • Rich Metadata Handling: Lhotse is highly optimized for working with audio datasets that include rich metadata, such as transcriptions, speaker labels, and more.
    • Feature Extraction: Includes utilities for extracting features like MFCCs, spectrograms, and more, commonly used in speech processing tasks.
    • Interoperability: Can work with existing datasets and tools, making it easy to integrate into existing workflows.

Best For:

  • Speech processing tasks, such as speech recognition, speaker verification, or speech synthesis.
  • Projects that require detailed handling of audio data and associated metadata.
  • Use cases where preprocessing (e.g., feature extraction) and dataset preparation are crucial components of the workflow.

Comparison Summary:

  • Focus:
    • WebDataset is more general-purpose, suitable for handling a variety of data types (e.g., images, audio, text) in large-scale, distributed training environments.
    • Lhotse is specialized for speech and audio processing, with extensive support for audio-specific data preparation, feature extraction, and metadata management.
  • Use Cases:
    • Use WebDataset if your project involves diverse types of large-scale data that need to be streamed efficiently during training, particularly in distributed setups.
    • Use Lhotse if your focus is on speech processing tasks, and you need robust tools for managing and preparing large audio datasets with detailed annotations.
  • Integration:
    • Both integrate well with PyTorch, but WebDataset focuses on data loading efficiency and scalability, while Lhotse provides a comprehensive toolkit for the entire data preparation process in speech tasks.

Lhotse is a Python toolkit designed to facilitate the preparation, processing, and management of large-scale speech and audio datasets, particularly for tasks in speech processing. Its comprehensive features for dataset preparation, feature extraction, and metadata management make it an invaluable tool for anyone working with large-scale speech and audio data. Whether you're developing ASR systems, speaker verification models, or other speech-related technologies, Lhotse provides the necessary tools to streamline and enhance your data processing workflows. It is named after Lhotse, the fourth highest mountain in the world, reflecting its goal to handle large and complex audio data efficiently.

 

Key Features:

  • Dataset Preparation:
    • Lhotse provides a comprehensive set of tools for preparing speech datasets, including downloading, organizing, and processing audio data.
    • It supports various audio formats (e.g., WAV, MP3, FLAC) and can handle different sampling rates and channel configurations.
  • Feature Extraction:
    • The toolkit includes utilities for extracting common audio features used in speech processing, such as Mel-frequency cepstral coefficients (MFCCs), filter banks, and spectrograms.
    • These features are crucial for tasks like ASR and are compatible with machine learning models.
  • Rich Metadata Handling:
    • Lhotse allows for the detailed management of metadata associated with audio files, such as transcriptions, speaker labels, and timing information (e.g., start and end times of utterances).
    • This capability is particularly important for tasks requiring alignment between audio and text, such as speech recognition.
  • Data Augmentation:
    • The toolkit includes built-in support for data augmentation techniques, such as speed perturbation and noise injection, which are commonly used to improve the robustness of speech models.
  • Interoperability:
    • Lhotse is designed to be compatible with existing datasets and tools in the speech processing ecosystem. It can work with popular datasets like LibriSpeech, VoxCeleb, and others.
    • It also integrates smoothly with PyTorch, providing ready-to-use Dataset classes that can be directly employed in training pipelines.
  • Scalability and Efficiency:
    • Lhotse is optimized for efficiency, handling large datasets and extensive metadata without becoming a bottleneck in the data processing pipeline.
    • It supports lazy loading and caching, which helps in managing memory usage and speeding up data access during training.

WebDataset is a PyTorch-compatible library designed to streamline the process of working with large-scale datasets stored in archive formats, such as tar files. It is particularly useful for training deep learning models in distributed environments, where efficient data loading and processing are critical.

 

Key Features:

  • Streaming and Sharding: WebDataset allows you to stream data directly from tar archives, making it ideal for large datasets that don't fit into memory. It also supports sharding, which helps in distributing the data across multiple GPUs or nodes, facilitating parallel processing.
  • Flexible Data Formats: You can store various types of data (e.g., images, audio, text) within the same tar archive, and the library can handle these different formats seamlessly. This flexibility makes it suitable for complex machine learning tasks that involve multi-modal data.
  • Integration with PyTorch DataLoader: WebDataset integrates smoothly with PyTorch's DataLoader, enabling efficient and scalable data pipelines. You can easily create custom datasets that load and preprocess data on-the-fly during training.
  • Performance Optimization: By leveraging streaming, compression, and parallel processing, WebDataset helps minimize I/O bottlenecks and maximizes training throughput, which is especially beneficial in large-scale, distributed training scenarios.
  •  

Use Cases:

  • Distributed Training: WebDataset is often used in scenarios where training needs to be distributed across multiple GPUs or machines, making it easier to manage large datasets efficiently.
  • Large-Scale Image or Audio Processing: It’s particularly useful for projects that involve massive collections of images or audio files, where data needs to be processed quickly and efficiently.
  • Data Pipelines in the Cloud: The streaming capability of WebDataset also makes it suitable for cloud-based environments, where data can be streamed directly from cloud storage services without needing to download everything first.

 

Data Format for Large-Scale Audio and Text Data

  1. Audio-Specific Formats (WAV, MP3)
    • Best For: Raw audio data storage.
    • Pros: Widely supported, easy to process with torchaudio.
    • Cons: Not efficient for large-scale direct training without preprocessing.
    • Usage: Raw audio data storage, paired with metadata for ML training.
  2. WebDataset
    • Best For: Streaming data in distributed environments.
    • Pros: Ideal for large-scale, distributed training.
    • Cons: Requires understanding of sharding and streaming.
    • Usage: Distributed machine learning with large datasets stored in tar archives.
  3. TFRecords
    • Best For: Sequential data access, TensorFlow compatibility.
    • Pros: Efficient for large datasets, shuffling, and streaming.
    • Cons: Primarily TensorFlow-focused, additional work needed for PyTorch integration.
    • Usage: Large-scale text or audio datasets in TensorFlow; possible but less seamless in PyTorch.
  4. Tar Files
    • Best For: Archival, bundling files.
    • Pros: Simple, supports various file types.
    • Cons: Inefficient for direct ML workflows; requires extraction.
    • Usage: Storing and transporting collections of audio/text files.
  5. Parquet
    • Best For: Columnar data, big data integration.
    • Pros: High compression, efficient for structured data, big data tools compatible.
    • Cons: Less intuitive for raw audio/text.
    • Usage: Tabular data or feature-rich datasets, especially when working with big data frameworks.
  6. HDF5
    • Best For: Hierarchical, complex datasets.
    • Pros: Efficient storage, supports mixed data types.
    • Cons: Overhead of learning HDF5 API; large file sizes can be cumbersome.
    • Usage: Large, complex datasets with multiple data types (audio, text, metadata).
  7. Zarr
    • Best For: Cloud-based, parallel processing.
    • Pros: Cloud-native, efficient for massive datasets.
    • Cons: Requires specialized libraries for access.
    • Usage: Scientific computing, cloud-based storage and access.
  8. LMDB
    • Best For: Fast random access to large datasets.
    • Pros: Low overhead, fast read times.
    • Cons: Primarily key-value storage; less intuitive for non-tabular data.
    • Usage: Datasets requiring rapid access, such as image or audio datasets.
  9. NPZ (Numpy ZIP)
    • Best For: Small to medium datasets.
    • Pros: Simple, integrates easily with NumPy and PyTorch.
    • Cons: Limited scalability for very large datasets.
    • Usage: Prototyping, research, smaller projects.
  10. Apache Arrow
    • Best For: In-memory data processing.
    • Pros: Fast data interchange, zero-copy reads.
    • Cons: Primarily in-memory; not optimized for large-scale disk storage.
    • Usage: Data interchange between processing frameworks; efficient in-memory operations.
  11. Petastorm
    • Best For: Distributed big data processing.
    • Pros: Supports sharding, Parquet integration.
    • Cons: Requires big data infrastructure.
    • Usage: Accessing large datasets stored in Parquet on distributed file systems.

As Docker containers have become a staple in the development and deployment of machine learning applications, it's crucial to optimize Docker images to reduce their size and build time. This not only speeds up development cycles but also makes deployment more efficient. In this blog, we'll explore practical techniques to optimize Docker images using a Python PyTorch application as an example.


1. Choose Minimal Base Images

The base image you select can have a huge impact on your final Docker image size. For Python applications, especially when working with PyTorch, choosing a minimal base image can drastically reduce the size of your Docker image.

Example: Switching from python to python-slim or alpine

Before:

FROM python:3.9

This base image is comprehensive but can be quite large, often over 100 MB.

After:

FROM python:3.9-slim

The slim version of the Python image is much smaller, around 50 MB, but still contains enough tools to run your Python applications.

Impact:

Switching to a minimal base image like python:3.9-slim can reduce the base image size by half or more, leading to smaller Docker images and faster builds.

 


2. Use Multi-Stage Builds

Multi-stage builds are a powerful feature in Docker that allows you to build your application in one stage and then copy only the necessary parts to a final, smaller image. This helps to keep your Docker images lean and efficient by removing unnecessary files and dependencies.

Example: Building a PyTorch Application

Before:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

CMD ["python", "train.py"]

In this example, all the dependencies and application files are installed and copied into the final image, which makes the image bigger.

After:

# First stage: Build the application
FROM python:3.9-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

# Second stage: Create the final image
FROM python:3.9-slim
WORKDIR /app
# Copy only the necessary files from the builder stage
COPY --from=builder /app /app

CMD ["python", "train.py"]

In this improved version, the builder stage installs all the dependencies and builds the application. The final image only includes the files needed to run the application, without all the extra tools and files used during the build process.

Impact:

Using multi-stage builds helps you create a much smaller Docker image by excluding unnecessary files and dependencies from the final image. This leads to faster downloads, quicker deployments, and more efficient storage use.


3. Minimize Layers in Dockerfile

Each command in a Dockerfile creates a new layer in the final image. Reducing the number of layers by combining commands can help decrease the image size.

Example: Combining Commands

Before:

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN python setup.py install

After:

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
COPY . .
RUN pip install --no-cache-dir -r requirements.txt && \
    python setup.py install

Here, the pip install and python setup.py install commands are combined into a single RUN instruction.

Impact:

By reducing the number of layers, the final image is smaller and more efficient, leading to quicker build times and less disk usage.


4. Leverage .dockerignore

A .dockerignore file can be used to exclude unnecessary files and directories from being copied into the Docker image, which reduces the size of the build context and the final image.

Example: Creating a .dockerignore File

Example .dockerignore:

__pycache__
*.pyc
.git
Dockerfile
README.md

Impact:

By excluding files like __pycache__, .git, and other unnecessary files, you can reduce the size of the build context, which speeds up the build process and results in a smaller Docker image.

5. Clean Up After Yourself

Temporary files and caches left over after installing dependencies can unnecessarily bloat your Docker image. Cleaning up these files can make a big difference in the final image size.

Example: Cleaning Up in a PyTorch Dockerfile

Before:

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

After:

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
    rm -rf /root/.cache/pip

In this optimized Dockerfile, we clean up the pip cache after installing dependencies to reduce the image size.

Impact:

Removing unnecessary files and caches reduces the Docker image size, leading to faster builds, quicker downloads, and more efficient use of storage.


Conclusion

Optimizing Docker images by

  1. selecting minimal base images
  2. using multi-stage builds
  3. minimizing Dockerfile layers
  4. leveraging .dockerignore
  5. cleaning up after installations

These can significantly reduce image size and build times. These optimizations not only improve the efficiency of your Docker workflow but also lead to faster deployments, reduced storage costs, and a more streamlined development process.

1. Pulling the Image from Docker Registry

Start by pulling the image from your Amazon ECR registry with the tag v1.0.0.

Command:

docker pull 12345689.dkr.ecr.us-east-1.amazonaws.com/asr-docker:v1.0.0

 

2. Attaching to the Image

Create and start a container from the image and attach to it with an interactive terminal.

Command:

docker run -it 12345689.dkr.ecr.us-east-1.amazonaws.com/asr-docker:v1.0.0 /bin/bash

 

3. Making Changes Inside the Container

After attaching to the container, make any necessary changes. Once done, exit the terminal by typing:

Command:

apt update && apt upgrade -yq && \
    apt install -yq \
        gcc \
        wget \
        htop \
        python3-pip

exit

 

4. Committing Changes

After making changes inside the container, commit those changes to a new Docker image.

Command:

docker commit <container_id> 12345689.dkr.ecr.us-east-1.amazonaws.com/asr-docker:v1.0.1

 

5. Tagging the New Image

Since the new image is already correctly tagged in this example, this step can be considered complete with the previous commit command. However, if you needed to tag the image differently, you would use:

Command:

docker tag 12345689.dkr.ecr.us-east-1.amazonaws.com/asr-docker:v1.0.1 12345689.dkr.ecr.us-east-1.amazonaws.com/asr-docker:another-tag

 

6. Pushing to Registry

Push your newly tagged image to the Amazon ECR registry.

Command:

docker push 12345689.dkr.ecr.us-east-1.amazonaws.com/asr-docker:v1.0.1

 

7. Cleanup

Finally, clean up unused Docker resources to free up space.

Commands:

docker system prune
docker image prune

Math problems on platforms like LeetCode often require a combination of mathematical insight, algorithmic thinking, and coding skills. These problems can range from simple arithmetic operations to complex mathematical theories. Here, I will explain some common types of math problems and the techniques used to solve them.

Common Types of Math Problems and Techniques

  1. Random Pick with Weight
  2. Basic Calculator II
  3. Pow(x, n)
  4. K Closest Points to Origin
  5. Continuous Subarray Sum
  6. Random Pick Index
  7. Maximum Swap
  8. Add Strings

528. Random Pick with Weight

Problem: Given an array w of positive integers where w[i] describes the weight of index i, write a function pickIndex which randomly picks an index in proportion to its weight.

Approach:

  • Use prefix sums and binary search.
  • Compute the prefix sum array and use binary search to pick an index based on a random number.

Code:

import random
import bisect

class Solution:
    def __init__(self, w):
        self.prefix_sums = []
        prefix_sum = 0
        for weight in w:
            prefix_sum += weight
            self.prefix_sums.append(prefix_sum)
        self.total_sum = prefix_sum

    def pickIndex(self):
        target = random.randint(1, self.total_sum)
        return bisect.bisect_left(self.prefix_sums, target)

# Example usage:
weights = [1, 3, 2]
obj = Solution(weights)
print(obj.pickIndex())  # Randomly returns 0, 1, or 2 based on weights

Explanation:

  1. Prefix Sums: Compute the prefix sum of weights.
  2. Binary Search: Use binary search to find the index corresponding to a random target within the total weight range.

227. Basic Calculator II

Problem: Implement a basic calculator to evaluate a simple expression string containing non-negative integers, +, -, *, and / operators.

Approach:

  • Use a stack to manage the numbers and operators.
  • Traverse the string and handle operations based on operator precedence.

Code:

def calculate(s):
    if not s:
        return 0

    stack = []
    num = 0
    sign = '+'
    s += '+'

    for c in s:
        if c.isdigit():
            num = num * 10 + int(c)
        elif c in '+-*/':
            if sign == '+':
                stack.append(num)
            elif sign == '-':
                stack.append(-num)
            elif sign == '*':
                stack.append(stack.pop() * num)
            elif sign == '/':
                stack.append(int(stack.pop() / num))
            sign = c
            num = 0

    return sum(stack)

# Example usage:
expression = "3+2*2"
print(calculate(expression))  # Output: 7

Explanation:

  1. Stack for Numbers: Use a stack to handle numbers and intermediate results.
  2. Operator Precedence: Handle * and / immediately, defer + and - until the end.

50. Pow(x, n)

Problem: Implement pow(x, n), which calculates x raised to the power n.

Approach:

  • Use recursion and the concept of exponentiation by squaring.

Code:

def my_pow(x, n):
    def helper(x, n):
        if n == 0:
            return 1
        half = helper(x, n // 2)
        if n % 2 == 0:
            return half * half
        else:
            return half * half * x

    if n < 0:
        x = 1 / x
        n = -n
    return helper(x, n)

# Example usage:
x = 2.0
n = 10
print(my_pow(x, n))  # Output: 1024.0

Explanation:

  1. Recursion: Break down the problem into smaller subproblems using recursion.
  2. Exponentiation by Squaring: Efficiently compute powers by squaring intermediate results.

973. K Closest Points to Origin

Problem: Given an array of points where points[i] = [xi, yi] represents a point on the XY plane, return the k closest points to the origin (0, 0).

Approach:

  • Use a max-heap to keep track of the k closest points.

Code:

import heapq

def k_closest(points, k):
    max_heap = []
    for x, y in points:
        dist = -(x**2 + y**2)
        if len(max_heap) == k:
            heapq.heappushpop(max_heap, (dist, x, y))
        else:
            heapq.heappush(max_heap, (dist, x, y))
    return [(x, y) for (dist, x, y) in max_heap]

# Example usage:
points = [[1, 3], [-2, 2]]
k = 1
print(k_closest(points, k))  # Output: [[-2, 2]]

Explanation:

  1. Max-Heap: Use a max-heap to keep the closest k points by distance.
  2. Distance Calculation: Calculate the Euclidean distance and maintain the closest points.

523. Continuous Subarray Sum

Problem: Given an integer array nums and an integer k, return true if nums has a continuous subarray of size at least two whose elements sum up to a multiple of k.

Approach:

  • Use a hash map to store the running sum modulo k.

Code:

def check_subarray_sum(nums, k):
    sum_map = {0: -1}
    running_sum = 0

    for i, num in enumerate(nums):
        running_sum += num
        if k != 0:
            running_sum %= k
        if running_sum in sum_map:
            if i - sum_map[running_sum] > 1:
                return True
        else:
            sum_map[running_sum] = i

    return False

# Example usage:
nums = [23, 2, 4, 6, 7]
k = 6
print(check_subarray_sum(nums, k))  # Output: True

Explanation:

  1. Running Sum: Calculate the running sum modulo k.
  2. Hash Map: Use a hash map to track the first occurrence of each remainder and check the distance between occurrences.

398. Random Pick Index

Problem: Given an integer array nums with possible duplicates, implement the Solution class:

  • pick(target): Randomly returns the index of the target number.

Approach:

  • Use reservoir sampling to handle random selection efficiently.

Code:

import random

class Solution:
    def __init__(self, nums):
        self.nums = nums

    def pick(self, target):
        count = 0
        result = -1
        for i, num in enumerate(self.nums):
            if num == target:
                count += 1
                if random.randint(1, count) == count:
                    result = i
        return result

# Example usage:
nums = [1, 2, 3, 3, 3]
target = 3
obj = Solution(nums)
print(obj.pick(target))  # Randomly returns one of the indices: 2, 3, or 4

Explanation:

  1. Reservoir Sampling: Ensure each index has an equal probability of being chosen.
  2. Count Occurrences: Traverse the array, counting occurrences of the target and selecting based on random chance.

670. Maximum Swap

Problem: Given a non-negative integer, you can swap two digits at most once to get the maximum valued number. Return the maximum valued number.

Approach:

  • Use a greedy algorithm to find the best swap.

Code:

def maximum_swap(num):
    digits = list(str(num))
    last = {int(x): i for i, x in enumerate(digits)}

    for i, x in enumerate(digits):
        for d in range(9, int(x), -1):
            if last.get(d, -1) > i:
                digits[i], digits[last[d]] = digits[last[d]], digits[i]
                return int(''.join(digits))

    return num

# Example usage:
num = 2736
print(maximum_swap(num))  # Output: 7236

Explanation:

  1. Greedy Approach: Find the highest digit that can be swapped to maximize the number.
  2. Track Last Occurrence: Use a dictionary to store the last occurrence of each digit for efficient swaps.

415. Add Strings

Problem: Given two non-negative integers represented as strings, return the sum of the two numbers as a string.

Approach:

  • Use digit-by-digit addition with carry handling.

Code:

def add_strings(num1, num2):
    i, j = len(num1) - 1, len(num2) - 1
    carry = 0
    result = []

    while i >= 0 or j >= 0 or carry:
        x = int(num1[i]) if i >= 0 else 0
        y = int(num2[j]) if j >= 0 else 0
        total = x + y + carry
        carry = total // 10
        result.append(total % 10)
        i -= 1
        j -= 1

    return ''.join(map(str, result[::-1]))

# Example usage:
num1 = "123"
num2 = "456"

+ Recent posts