A stream of numbers arrives one at a time, and after every arrival someone asks for the median. Sorting on every question costs O(n log n) per ask, and keeping a sorted array makes each insert O(n). The two-heaps pattern drops both: it keeps the lower half of the data in a max-heap and the upper half in a min-heap, so the boundary between the halves is always sitting on the two heap tops. Inserts cost O(log n), the median read costs O(1), and nothing is ever fully sorted.
The trick generalizes past medians: any time you track a running boundary between two halves of a changing set, two heaps facing each other hold exactly the two elements the boundary touches.
The arrangement matters more than the heaps. The lower half uses a max-heap, so its top is the largest small value. The upper half uses a min-heap, so its top is the smallest large value. Two invariants hold at all times: everything in the lower heap is at most everything in the upper heap, and the sizes never differ by more than one. Picture a seesaw: two teams lean against a pivot, at most one player crosses over after each arrival, and the two players pressed against the pivot are the only ones you ever need to look at.
Trace the stream 5, 15, 1, 3. The 5 lands in the lower heap, median 5. The 15 belongs above, heaps balanced at one each, median (5 + 15) / 2 = 10. The 1 joins the lower side, which now holds the extra element, median back to 5. Now predict the last step before reading on: the 3 also belongs below, which would put three elements under the pivot and one above.
The seesaw tips, so the rebalance ships the lower heap's top, the 5, across to the upper side. Lower holds {1, 3}, upper holds {5, 15}, and the median is (3 + 5) / 2 = 4. The seesaw picture has one lie in it worth flagging: inside each heap, players are not lined up in order. A heap guarantees its top and nothing else, which is also why you can't cheaply pull someone out of the middle of a team.
The implementation has one idiom worth stealing: route every insert through the lower heap and immediately migrate its top to the upper heap, then rebalance sizes if needed. That detour looks wasteful, but it makes the ordering invariant self-enforcing, because the value that crosses over is by construction the largest of the lower side. Deciding placement by comparing against a top by hand works too, until the comparison is against the wrong top.
import heapq
class MedianFinder:
def __init__(self):
# Max-heap (negated) for the lower half, min-heap for the upper
self.lower = []
self.upper = []
def add_num(self, num: int) -> None:
# Route through the lower half, migrating its top across
heapq.heappush(self.upper, -heapq.heappushpop(self.lower, -num))
# Keep the lower half holding the extra element
if len(self.upper) > len(self.lower):
heapq.heappush(self.lower, -heapq.heappop(self.upper))
def find_median(self) -> float:
# Odd count: the extra element is the lower half's top
if len(self.lower) > len(self.upper):
return -self.lower[0]
return (-self.lower[0] + self.upper[0]) / 2One convention runs through this code: the lower heap holds the extra element on odd counts, so odd medians read from its top. Other write-ups give the extra to the upper heap, which is equally valid and changes which top answers. What breaks is mixing the two halves of different conventions.
The pattern's second life uses two heaps with different orderings. IPO gives you starting capital, projects each requiring some capital and returning some profit, and k rounds to maximize money. A min-heap keyed by required capital surfaces projects as they become affordable, and a max-heap keyed by profit ranks the unlocked ones. Each round moves the newly affordable projects from the first heap to the second, then greedily banks the most profitable, and the greed is safe because finishing a project only ever increases capital.
import heapq
def maximize_capital(k: int, capital: int, capitals: list[int], profits: list[int]) -> int:
# Projects sorted by the capital they require
projects = sorted(zip(capitals, profits))
affordable = [] # max-heap of profits, negated
i = 0
for _ in range(k):
# Unlock every project the current capital can start
while i < len(projects) and projects[i][0] <= capital:
heapq.heappush(affordable, -projects[i][1])
i += 1
# Nothing affordable now means nothing ever will be
if not affordable:
break
# Greedy: bank the most profitable unlocked project
capital += -heapq.heappop(affordable)
return capitalO(1) reads can see them.O(1), which is the whole point of the structure.Running medians are the headline act (Find Median from Data Stream), and Sliding Window Median adds departures, handled with lazy deletion: mark elements as logically dead in a counter and only discard them when they reach a heap top. The unlock-then-rank family (IPO, deadline scheduling) reuses the geometry with two different sort keys. The shared recognition signal is needing cheap access to both sides of a moving boundary, the largest of one group and the smallest of another, at the same time.
You might think a problem this ordered demands a sorted container, a balanced tree with order statistics. The pattern's insight is that the question never asks about the whole order, only about the boundary, and maintaining two heap tops is far cheaper than maintaining a full ranking.
The pattern is small enough that most failures are convention slips. Five guards:
heapq only speaks min-heap, so the lower half stores -num and negates again on the way out. Java flips with Collections.reverseOrder(). Pick the idiom once and apply it consistently, since half-negated heaps produce medians that are wrong by sign.Two checks before the set: which two elements decide the median at any moment, and why does the insert cost O(log n) while the median read costs O(1)? The canon here is short and dense, so each problem earns full attention:
Completed | Title | Status | Notes | Solution | |
|---|---|---|---|---|---|
Easy | |||||
Medium | |||||
Hard | |||||
Hard |