트라이 — 문자열 특화 자료구조

이 토픽을 마치면

트라이의 구조와 동작 원리를 설명할 수 있고, 삽입, 탐색, 접두사 검색을 구현하여 자동완성 같은 기능을 만들 수 있습니다.

문제 — 수만 개의 단어에서 검색하기

사전에 10만 개의 단어가 있습니다. 사용자가 "pro"를 입력하면 "program", "project", "process" 등을 보여주고 싶습니다.

리스트에서 하나씩 비교하면? 10만 개 × 문자열 비교 = 느립니다. 해시 테이블은 정확한 키만 찾을 수 있어서 접두사 검색에 적합하지 않습니다.

트라이(Trie, "retrieval"에서 유래)는 이 문제를 위해 만들어진 자료구조입니다.

트라이의 구조

트라이는 문자 하나당 노드 하나입니다. 같은 접두사를 공유하는 단어들은 같은 경로를 탑니다:

text

root
├── a
│   ├── p
│   │   └── p ★ ("app")
│   │       └── l
│   │           └── e ★ ("apple")
│   └── c
│       └── e ★ ("ace")
└── b
    ├── a
    │   └── t ★ ("bat")
    └── e ★ ("be")

★ 표시는 "여기서 단어가 끝난다"는 의미입니다. "app"과 "apple"은 같은 경로를 공유하다가 "app"에서 한 번, "apple"에서 한 번 끝납니다.

구현 — 노드

python

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

각 노드는 자식 노드의 딕셔너리(children)와 단어 종료 표시(is_end)를 갖습니다.

삽입

python

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

python

trie = Trie()
for word in ["app", "apple", "ace", "bat", "be"]:
    trie.insert(word)

문자를 하나씩 따라가면서, 경로가 없으면 새 노드를 만듭니다. 마지막 문자에서 is_end = True로 표시합니다.

탐색

python

def search(self, word):
    node = self.root
    for char in word:
        if char not in node.children:
            return False
        node = node.children[char]
    return node.is_end

python

print(trie.search("app"))    # True
print(trie.search("ap"))     # False — "ap"으로 끝나는 단어 없음
print(trie.search("apple"))  # True
print(trie.search("bat"))    # True
print(trie.search("bad"))    # False

경로를 따라가다 중간에 없으면 False. 끝까지 왔는데 is_end가 아니면 False(접두사일 뿐 완전한 단어가 아님).

접두사 검색 — 자동완성의 핵심

python

def starts_with(self, prefix):
    node = self.root
    for char in prefix:
        if char not in node.children:
            return []
        node = node.children[char]
    
    results = []
    self._collect(node, prefix, results)
    return results

def _collect(self, node, prefix, results):
    if node.is_end:
        results.append(prefix)
    for char, child in node.children.items():
        self._collect(child, prefix + char, results)

python

print(trie.starts_with("a"))    # ["app", "apple", "ace"]
print(trie.starts_with("ap"))   # ["app", "apple"]
print(trie.starts_with("b"))    # ["bat", "be"]
print(trie.starts_with("z"))    # []

접두사까지 따라간 다음, 그 아래의 모든 완전한 단어를 수집합니다. 검색 엔진의 자동완성, IDE의 코드 자동완성이 이 원리입니다.

시간 복잡도

연산	시간 복잡도	비교 (해시 테이블)
삽입	O(m)	O(m)
정확 탐색	O(m)	O(m)
접두사 탐색	O(m + k)	❌ 불가능

m = 문자열 길이, k = 접두사 아래 결과 수.

정확 탐색 성능은 해시 테이블과 비슷하지만, 접두사 기반 검색은 트라이만 가능합니다. 해시 테이블은 "pro"로 시작하는 모든 키를 찾으려면 전수 조사를 해야 합니다.

메모리 트레이드오프

트라이의 단점은 메모리 사용량입니다. 각 노드마다 딕셔너리를 갖고, 영문 소문자만 해도 최대 26개의 자식 포인터가 필요합니다.

이를 개선한 변형:

Compressed Trie (Radix Tree): 자식이 하나뿐인 노드를 합쳐서 경로 압축
Ternary Search Tree: 이진 탐색 트리와 혼합하여 메모리 절약

실무에서는 대부분 라이브러리가 최적화를 처리해주므로, 원리를 이해하고 있으면 충분합니다.

실무 활용

용도	설명
검색 자동완성	Google, IDE 코드 완성
사전 검사	맞춤법 검사 (입력 단어가 사전에 있는지)
IP 라우팅	네트워크 접두사 매칭 (longest prefix match)
T9 키패드	숫자→문자 변환 후 후보 단어 탐색

삭제 구현

python

def delete(self, word):
    def _delete(node, word, depth):
        if depth == len(word):
            if not node.is_end:
                return False
            node.is_end = False
            return len(node.children) == 0
        
        char = word[depth]
        if char not in node.children:
            return False
        
        should_remove = _delete(node.children[char], word, depth + 1)
        if should_remove:
            del node.children[char]
            return not node.is_end and len(node.children) == 0
        return False
    
    _delete(self.root, word, 0)

단어를 삭제할 때, 해당 단어만의 고유 노드(다른 단어가 공유하지 않는)만 제거합니다. "apple"을 삭제해도 "app"에 필요한 노드는 그대로 남습니다.

카운팅 트라이

삽입 횟수를 기록하면 단어 빈도 사전이 됩니다:

python

class CountTrie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
        node.count = getattr(node, 'count', 0) + 1
    
    def count(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return 0
            node = node.children[char]
        return getattr(node, 'count', 0) if node.is_end else 0

검색어 빈도 집계, 자동완성 순위 결정 등에 활용됩니다.

Python에서 간단한 트라이 대안

트라이를 직접 구현하지 않아도, dict의 중첩으로 비슷한 효과를 낼 수 있습니다:

python

from collections import defaultdict

def make_trie(words):
    trie = lambda: defaultdict(trie)
    root = trie()
    for word in words:
        node = root
        for c in word:
            node = node[c]
        node['$'] = True
    return root

t = make_trie(["app", "apple", "bat"])
print("app" in str(t))  # True

하지만 starts_with 같은 접두사 기능이 필요하면 클래스 기반 구현이 깔끔합니다. 코딩 테스트에서는 클래스 구현이 표준입니다.

트라이는 "접두사를 공유하는 문자열 집합"을 다루는 모든 곳에 적합합니다.