OI-Knowledge：Trie树

在计算机科学中，字典树（Trie树），是一种用于快速检索字符串的数据结构。字典树特别适用于词典、前缀匹配和自动补全等场景，可以在 O(m) 时间内完成插入和查询操作（其中 m 为字符串的长度）。

基本概念

字典树的核心思想是通过多叉树结构来存储字符串，每个节点代表一个字符。其主要功能包括：

插入操作：将一个字符串插入到字典树中。
查询操作：判断某个字符串是否存在于字典树中。
前缀匹配：查找具有共同前缀的所有字符串。

字典树的结构

字典树由一系列节点组成，每个节点包含以下信息：

一个长度为 26 的子节点数组（假设处理小写英文字母）。
一个布尔值标记，表示是否是一个完整字符串的结尾。

以下是字典树节点的定义：

struct TrieNode {
    TrieNode* children[26];
    bool isEndOfWord;

    TrieNode() {
        for (int i = 0; i < 26; i++) {
            children[i] = nullptr;
        }
        isEndOfWord = false;
    }
};

字典树的操作

插入操作

插入操作用于将一个字符串插入到字典树中，逐字符插入，并在字符串结尾处标记：

void insert(TrieNode* root, const string& key) {
    TrieNode* node = root;
    for (char c : key) {
        int index = c - 'a';
        if (node->children[index] == nullptr) {
            node->children[index] = new TrieNode();
        }
        node = node->children[index];
    }
    node->isEndOfWord = true;
}

查询操作

查询操作用于判断某个字符串是否存在于字典树中：

bool search(TrieNode* root, const string& key) {
    TrieNode* node = root;
    for (char c : key) {
        int index = c - 'a';
        if (node->children[index] == nullptr) {
            return false;
        }
        node = node->children[index];
    }
    return node != nullptr && node->isEndOfWord;
}

前缀匹配

前缀匹配用于查找具有共同前缀的所有字符串：

bool startsWith(TrieNode* root, const string& prefix) {
    TrieNode* node = root;
    for (char c : prefix) {
        int index = c - 'a';
        if (node->children[index] == nullptr) {
            return false;
        }
        node = node->children[index];
    }
    return true;
}

示例

假设我们有一个单词列表 ["apple", "app", "application", "bat", "ball", "cat"]，我们可以使用字典树来进行快速的插入和查询操作。

以下是示例代码：

int main() {
    TrieNode* root = new TrieNode();

    vector<string> words = {"apple", "app", "application", "bat", "ball", "cat"};
    for (const string& word : words) {
        insert(root, word);
    }

    cout << search(root, "app") << endl;        // 输出 1（true）
    cout << search(root, "application") << endl; // 输出 1（true）
    cout << search(root, "appl") << endl;        // 输出 0（false）
    cout << startsWith(root, "app") << endl;    // 输出 1（true）
    cout << startsWith(root, "bat") << endl;    // 输出 1（true）
    cout << startsWith(root, "baller") << endl; // 输出 0（false）

    return 0;
}

例题1：3045. 统计前后缀下标对 II

题目传送门：3045. 统计前后缀下标对 II - 力扣（LeetCode）

LeetCode第 385 场周赛的最后一题，比较板子，下面我们来分析一下，相比字典树传统解决的前缀匹配问题，这里题目还要求我们对后缀进行匹配。这里运用一个小trick就可以将这个问题转化为前缀匹配问题。就是我们把字符串的第n位变化成第n位和倒数第n位的组合，这样我们就可以只进行前缀匹配了（仔细思考一下就能明白为啥）

最后实现代码如下：

struct TrieNode{
    unordered_map<int, TrieNode*> son;
    int cnt = 0;
};
class Solution {
public:
    long long countPrefixSuffixPairs(vector<string> &words) {
        long long ans = 0;
        TrieNode *root = new TrieNode();
        for (string &s : words) {
            int n =s.size();
            TrieNode *cur = root;
            for(int i=0;i<n;++i){ // 创建树链
                int p = (int)(s[i]-'a')<<5 | (s[n - 1 - i] - 'a');
                if (cur->son[p]==nullptr) {
                    cur->son[p] = new TrieNode();
                }
                cur = cur->son[p];
                ans += cur->cnt; // 创建的过程中更新答案
            }
            cur->cnt++;
        }
        return ans;
    }
};

总结

字典树是一种高效且易于实现的数据结构，特别适用于字符串的插入和查询操作。通过多叉树结构，字典树能够在 O(m) 的时间复杂度内完成操作，极大地提高了性能。