NGワード・コンテンツフィルタ詳細設計 / NG Word & Content Filter Design

概要 / Overview

配信コンテンツの安全性を担保するため、視聴者コメント（入力）とLLM応答（出力）の両方にフィルタリングパイプラインを設ける。全フィルタは日本語・英語の両方に対応する。

To ensure content safety during live streaming, filtering pipelines are applied to both viewer comments (input) and LLM responses (output). All filters support both Japanese and English.

アーキテクチャ全体図 / Architecture Overview

Viewer Comment (YouTube / TikTok)
       │
       ▼
┌──────────────────────────────────────┐
│         INPUT FILTER PIPELINE        │
│                                      │
│  1. Normalize (Unicode正規化)        │
│  2. Rate Limiter (連投制限)          │
│  3. NG Word Check (NGワード判定)     │
│  4. Semantic Filter (意味的フィルタ) │
│  5. Language Detection (言語判定)    │
└────────────┬─────────────────────────┘
             │
             │  PASS → Comment Queue (10-min cycle / Tip immediate)
             │  BLOCK → Silently discard + log
             │
             ▼
┌──────────────────────────────────────┐
│           LLM Processing             │
│  (Claude API with safety settings)   │
└────────────┬─────────────────────────┘
             │
             ▼
┌──────────────────────────────────────┐
│        OUTPUT FILTER PIPELINE        │
│                                      │
│  1. NG Word Check (NGワード判定)     │
│  2. Content Policy Check (ポリシー)  │
│  3. Length / Format Validation       │
│  4. Fallback Handler (代替行動)      │
└────────────┬─────────────────────────┘
             │
             │  PASS → Render to screen
             │  BLOCK → Fallback action triggered
             │
             ▼
       Display on Stream

入力フィルタパイプライン / Input Filter Pipeline

視聴者コメントがLLMに渡される前に適用される5段階のフィルタ。

Five-stage filter applied to viewer comments before they reach the LLM.

Stage 1: Unicode 正規化 / Unicode Normalization

フィルタ回避のための文字置換を無効化する。

Neutralize character substitution tricks used to bypass filters.

Processing	Example	Detail
NFKC 正規化	`ｓｈｉｔ` → `shit`	全角→半角、互換文字の統一
ゼロ幅文字除去	`sh\u200Bit` → `shit`	Zero-width joiner/non-joiner removal
ホモグリフ変換	`ѕhіt` (Cyrillic) → `shit`	Confusable character normalization
繰り返し圧縮	`shiiiit` → `shit`	3文字以上の連続を2文字に圧縮
Leet speak 変換	`sh1t`, `$h!t` → `shit`	数字・記号を対応文字に変換
カタカナ→ひらがな	`シネ` → `しね`	カタカナをひらがなに統一して判定
スペース除去判定	`s h i t` → `shit`	スペース挿入による回避の検出

typescript

function normalize(text: string): string {
  let normalized = text.normalize('NFKC');
  normalized = removeZeroWidthChars(normalized);
  normalized = convertHomoglyphs(normalized);
  normalized = compressRepeats(normalized, 3);
  normalized = convertLeetSpeak(normalized);
  normalized = katakanaToHiragana(normalized);
  return normalized;
}

Stage 2: レートリミッター / Rate Limiter

連投・スパム行為を制限する。判定は userId 単位で行う。

Throttle spam and rapid-fire comments. Evaluated per userId.

Rule	Threshold	Action
連投制限 / Rapid fire	5+ comments in 30 seconds	30秒間のコメントを全て無視
重複コメント / Duplicate	Same text 3+ times in 5 min	2回目以降を無視
長文制限 / Length limit	> 200 characters	200文字で切り詰め
NG累積 / NG accumulation	3+ NG hits in 10 min	Temporary mute (10 min)
常習犯 / Repeat offender	10+ NG hits in 24 hours	Temporary mute (60 min)

typescript

interface RateLimitState {
  userId: string;
  platform: 'youtube' | 'tiktok';
  recentComments: { text: string; timestamp: number }[];
  ngHitCount24h: number;
  muteUntil: number | null;
}

Stage 3: NGワード判定 / NG Word Check

正規化済みテキストに対してNGワードリストを照合する。

Match normalized text against the NG word list.

判定パターンの詳細は後述「NGワード判定パターン」を参照。

See "Matching Patterns" section below for details.

Check	Method	Example
完全一致 / Exact match	`word === ngWord`	「死ね」→ NG
部分一致 / Partial match	`text.includes(ngWord)`	「お前死ねよ」→ NG（「死ね」部分一致）
正規表現 / Regex	`regex.test(text)`	`/殺\s*す/` → 「殺す」→ NG

Stage 4: 意味的フィルタ / Semantic Filter

NGワードリストでは捕捉できない巧妙な表現を検出する軽量チェック。

Lightweight check to catch expressions that bypass keyword lists.

Pattern	Detection Method
隠語・暗喩 / Coded language	パターンマッチ（既知の隠語辞書）
個人情報要求 / PII solicitation	「住所」「電話番号」「本名」等のキーワード検出
外部誘導 / External links	URL・SNSハンドル検出（`https?://`, `@username`）
他配信者への誹謗 / Streamer attacks	配信者名リスト × ネガティブ表現パターン

将来拡張 / Future Enhancement

Phase 2以降で、Claude Haiku による軽量な意味分類（safe / suspicious / block）を導入可能。ただしMVPではコスト・レイテンシの観点からルールベースのみとする。

In Phase 2+, a lightweight Claude Haiku classification (safe / suspicious / block) can be introduced. For MVP, only rule-based filtering is used due to cost and latency concerns.

Stage 5: 言語判定 / Language Detection

LLMに渡す際に応答言語を指定するため、コメントの言語を判定する。

Detect comment language to specify response language when passing to the LLM.

Rule	Detail
日本語判定	ひらがな・カタカナ・漢字が含まれる → `ja`
英語判定	ASCII のみ → `en`
混合	日本語文字の割合 > 30% → `ja`、それ以外 → `en`
判定不能	デフォルト `ja`

出力フィルタパイプライン / Output Filter Pipeline

LLMの応答が配信画面に表示される前に適用されるフィルタ。

Filters applied to LLM responses before they are rendered on stream.

Stage 1: NGワード判定 / NG Word Check

入力フィルタと同一のNGワードリストを使用してLLM出力を検査する。

Inspect LLM output using the same NG word list as the input filter.

入力フィルタと同じ正規化処理を適用
出力でのNG検出は重大度が高いため、全マッチで即ブロック
Apply the same normalization as the input filter
NG detection in output is high severity — any match triggers an immediate block

Stage 2: コンテンツポリシーチェック / Content Policy Check

キーワード以外のコンテンツポリシー違反を検出する。

Detect content policy violations beyond keyword matching.

Policy	Rule	Example
暴力的表現 / Violence	身体的暴力の描写を含む行動・台詞	「殴る」「蹴る」等の攻撃行動
性的表現 / Sexual	配信不適切な性的描写	露骨な表現、過度な身体描写
差別表現 / Discrimination	人種・性別・宗教等に基づく差別	ステレオタイプの強化、蔑称
実在固有名詞 / Real names	実在の製品名・作品名・人名の言及	著作権・商標リスク
違法行為 / Illegal acts	違法行為の推奨・描写	薬物使用、窃盗等
自傷行為 / Self-harm	自傷行為の描写・推奨	—

typescript

interface PolicyCheckResult {
  passed: boolean;
  violations: {
    policy: string;
    severity: 'low' | 'medium' | 'high';
    matchedText: string;
  }[];
}

Stage 3: 長さ・フォーマット検証 / Length & Format Validation

Rule	Threshold	Action
最大文字数 / Max length	台詞: 100文字 / Dialogue: 100 chars	超過分を切り詰め
空応答 / Empty response	0文字	フォールバック発動
禁止フォーマット / Forbidden format	Markdown、HTML、コードブロック	プレーンテキストに変換
改行数 / Line breaks	最大3行	超過分を削除

Stage 4: フォールバックハンドラ / Fallback Handler

出力フィルタでブロックされた場合の代替行動を決定する。

Determine alternative actions when the output filter blocks a response.

詳細は後述「フォールバック行動」を参照。

See "Fallback Behaviors" section below.

NGワードリストの管理 / NG Word List Management

カテゴリ分類 / Categories

NGワードをカテゴリ別に管理し、カテゴリごとに重大度を設定する。

NG words are managed by category, each with an assigned severity level.

Category	Severity	JP Examples	EN Examples
violence / 暴力	high	殺す、死ね、ころす	kill, murder, die
sexual / 性的	high	（性的表現）	（sexual expressions）
discrimination / 差別	high	差別用語全般	Slurs, derogatory terms
self_harm / 自傷	high	自殺、リスカ	suicide, self-harm
profanity / 罵倒	medium	クソ、バカ、アホ	fuck, shit, asshole
harassment / ハラスメント	medium	キモい、ウザい、消えろ	creep, disgusting, go away
spam / スパム	low	宣伝URL、連絡先	Ad URLs, contact info
competitor / 競合	low	他配信者名	Other streamer names
pii_request / 個人情報	medium	住所教えて、本名は？	where do you live, real name?

データ構造 / Data Structure

json

{
  "version": "1.0.0",
  "lastUpdated": "2026-03-11T00:00:00Z",
  "categories": {
    "violence": {
      "severity": "high",
      "words": [
        {
          "pattern": "殺す",
          "type": "partial",
          "lang": "ja",
          "note": "殺すの活用形はregexで別途定義"
        },
        {
          "pattern": "kill\\s*(you|him|her|them|myself)",
          "type": "regex",
          "lang": "en"
        },
        {
          "pattern": "死ね",
          "type": "partial",
          "lang": "ja"
        }
      ]
    },
    "profanity": {
      "severity": "medium",
      "words": [
        {
          "pattern": "fuck",
          "type": "partial",
          "lang": "en"
        },
        {
          "pattern": "クソ",
          "type": "exact",
          "lang": "ja"
        }
      ]
    }
  }
}

ファイル構成 / File Structure

config/
└── content-filter/
    ├── ng-words.json          # Main NG word list
    ├── ng-words.local.json    # Local overrides (gitignored)
    ├── homoglyphs.json        # Confusable character mapping
    ├── leet-speak.json        # Leet speak conversion table
    └── allowlist.json         # False positive exceptions

管理方針 / Management Policy

Item	Policy
更新頻度 / Update frequency	随時追加。重大な新パターン発見時は即時対応
更新方法 / Update method	`ng-words.json` を編集し、サーバー再起動なしでホットリロード
バージョン管理 / Versioning	`ng-words.json` に `version` フィールドを含め、変更履歴をgit管理
ローカル上書き / Local overrides	`ng-words.local.json` で配信者固有のNGワードを追加可能（gitignored）
許可リスト / Allowlist	誤検出のあるワード（例：「殺風景」の「殺」）を `allowlist.json` で除外
ホットリロード / Hot reload	ファイル変更を `fs.watch` で監視し、サーバー再起動なしで反映

NGワード判定パターン / Matching Patterns

3つの判定方式 / Three Matching Methods

1. 完全一致 / Exact Match

正規化後のテキスト全体がNGワードと完全に一致する場合にブロック。単体で使われると有害だが、他の文脈では無害な単語に使用する。

Block when the entire normalized text exactly matches the NG word. Used for words that are harmful alone but harmless in other contexts.

type: "exact"
Input: "クソ" → Match ✓
Input: "クソゲー" → Match ✗ (部分一致で別途対応)

2. 部分一致 / Partial Match

正規化後のテキストにNGワードが部分文字列として含まれる場合にブロック。最も一般的な判定方式。

Block when the NG word appears as a substring in the normalized text. The most common matching method.

type: "partial"
Input: "お前死ねよ" → "死ね" partial match ✓
Input: "死ぬほど美味い" → "死ぬ" partial match... → allowlist check

誤検出対策 / False Positive Handling

部分一致は誤検出が発生しやすい。allowlist.json に例外パターンを登録して対応する。

Partial matching is prone to false positives. Register exception patterns in allowlist.json.

Allowlist examples:

「殺風景」（さっぷうけい / dreary）→ 「殺」を含むが無害
「必死」（ひっし / desperate）→ 「死」を含むが無害
"assassin" → contains "ass" but contextually fine in gaming
"scunthorpe" → contains profanity substring but is a place name

3. 正規表現 / Regex Match

活用形、スペース挿入、文字装飾などの回避パターンを捕捉する。

Catch evasion patterns including conjugations, space insertion, and character decoration.

type: "regex"

# 日本語活用形 / Japanese conjugations
/殺[すさしせそ]/ → 殺す、殺さない、殺した、etc.
/死[ねにな]/ → 死ね、死に、死な

# スペース挿入回避 / Space insertion evasion
/f\s*u\s*c\s*k/ → "f u c k", "f  uck", etc.

# 伏せ字回避 / Censored text evasion
/[死し][＊*○●][ねね]/ → 死○ね、し*ね

# 記号装飾 / Symbol decoration
/k+[i!1]+l+[l!1]*/ → "kiill", "k1ll", "ki!l"

判定フロー / Matching Flow

Normalized Text
      │
      ▼
┌─────────────────┐
│ Allowlist Check  │──── Match → PASS (skip NG check)
└────────┬────────┘
         │ No match
         ▼
┌─────────────────┐
│ Exact Match      │──── Match → BLOCK
└────────┬────────┘
         │ No match
         ▼
┌─────────────────┐
│ Partial Match    │──── Match → Allowlist re-check → BLOCK or PASS
└────────┬────────┘
         │ No match
         ▼
┌─────────────────┐
│ Regex Match      │──── Match → BLOCK
└────────┬────────┘
         │ No match
         ▼
       PASS

パフォーマンス考慮 / Performance Considerations

Concern	Solution
正規表現の実行コスト	正規表現は事前コンパイルしてキャッシュ。ホットリロード時に再コンパイル
大量のNGワード	Aho-Corasick アルゴリズムで部分一致を高速化（O(n)でテキスト長に比例）
判定のタイムアウト	1コメントあたり最大 10ms。超過時は PASS して警告ログ出力
メモリ使用量	NGワードリストは起動時に1回読み込み、メモリ上に保持

多言語対応 / Multi-Language Support (JP + EN)

言語別処理 / Language-Specific Processing

Processing	Japanese (ja)	English (en)
正規化 / Normalization	NFKC + カタカナ→ひらがな + 全角→半角	NFKC + lowercase + leet speak conversion
トークン化 / Tokenization	文字単位（形態素解析なし、MVP）	単語単位（スペース区切り）
部分一致 / Partial match	文字列レベルの部分一致	単語境界考慮 (`\b` word boundary)
活用形 / Conjugation	正規表現で主要な活用形をカバー	語幹マッチ（例: `kill` → `killing`, `killed`）
誤検出対策 / False positives	許可リストで日本語特有の誤検出を管理	Word boundary `\b` で大部分を防止

日本語特有の処理 / Japanese-Specific Processing

typescript

// カタカナ→ひらがな変換
// "シネ" → "しね" で「死ね」と同一視
function katakanaToHiragana(text: string): string {
  return text.replace(/[\u30A1-\u30F6]/g, (ch) =>
    String.fromCharCode(ch.charCodeAt(0) - 0x60)
  );
}

// 日本語特有の回避パターン
// ・伏せ字: 「し◯ね」「◯ね」
// ・当て字: 「氏ね」（し→氏、ね→ね）
// ・ローマ字: 「shine」（死ね）
const jaEvasionPatterns = [
  /[しシ][○◯〇＊*][ねネ]/,     // 伏せ字
  /氏[ねネ]/,                    // 当て字
  /shine(?!s|d|r)/,              // ローマ字 (but not "shines", "shined", "shiner")
];

英語特有の処理 / English-Specific Processing

typescript

// Word boundary matching to reduce false positives
// "class" should not match "ass"
function wordBoundaryMatch(text: string, word: string): boolean {
  const regex = new RegExp(`\\b${escapeRegex(word)}\\b`, 'i');
  return regex.test(text);
}

// Stem-based matching for conjugations
// "kill" → matches "killing", "killed", "kills", "killer"
function stemMatch(text: string, stem: string): boolean {
  const regex = new RegExp(`\\b${escapeRegex(stem)}(s|ed|ing|er|ers)?\\b`, 'i');
  return regex.test(text);
}

言語混合コメントの処理 / Mixed-Language Comment Handling

Scenario	Processing
日英混合	両方の言語のNGワードリストで判定
ローマ字日本語	既知の有害ローマ字パターンを正規表現で検出（例: `shine` = 死ね）
絵文字	有害な絵文字の組み合わせパターンは Phase 2 で対応

フォールバック行動 / Fallback Behaviors

入力フィルタでのブロック時 / When Input Filter Blocks

視聴者コメントがブロックされた場合、キャラクターはそのコメントを認識しない（何も起こらない）。

When a viewer comment is blocked, characters do not acknowledge it (nothing happens).

Scenario	Behavior
無料コメントがブロック	サイレント破棄。キャラクターは反応しない。ログにのみ記録
投げ銭コメントがブロック	投げ銭自体は受領・アイテム効果は発動するが、コメント内容はLLMに渡さない。キャラクターは汎用お礼で反応
ユーザーがミュート中	全コメントをサイレント破棄。投げ銭の金銭処理のみ実行

投げ銭コメントの汎用お礼 / Generic Tip Thank-You Messages

投げ銭のコメント部分がNGだった場合に使用する定型お礼メッセージ。

Pre-defined thank-you messages used when the comment portion of a tip is flagged as NG.

John:

Language	Messages
ja	「おっ、ありがとう！嬉しいな〜」「○○さん、ありがとー！」「おお、サンキュー！」
en	"Oh, thanks! That's awesome!" "Thank you, ○○!" "Hey, appreciate it!"

Sara:

Language	Messages
ja	「わー、○○さんありがとう！」「えへへ、ありがとうございます！」「嬉しい〜ありがとう！」
en	"Wow, thank you ○○!" "Hehe, thank you so much!" "Yay, thanks!"

Note

○○ は投げ銭者のユーザー名に置換される。名前自体がNGワードに該当する場合は「みなさん」/「everyone」に置換する。

○○ is replaced with the tipper's username. If the username itself matches an NG word, replace with "みなさん" / "everyone".

出力フィルタでのブロック時 / When Output Filter Blocks

LLM応答がブロックされた場合、キャラクターの性格に合った代替行動を実行する。

When an LLM response is blocked, execute an alternative action matching the character's personality.

代替行動テーブル / Fallback Action Table

Character	Fallback Actions (ja)	Fallback Actions (en)
John	「ん〜、何だっけ。まあいっか」（頭を掻く）	"Hmm, what was I saying... Anyway." (scratches head)
	「あ、イブ見て。かわいいな〜」（話題転換）	"Oh look, Eve. She's so cute~" (topic change)
	「ちょっとぼーっとしてた」（照れ笑い）	"Spaced out for a sec there." (awkward smile)
Sara	「あ、そうだ！ご飯のこと考えなきゃ」（話題転換）	"Oh right! I need to think about dinner." (topic change)
	「ふふ、なんでもない〜」（笑顔）	"Hehe, it's nothing~" (smile)
	「イブちゃん、おいで〜♪」（Eveを呼ぶ）	"Eve, come here~!" (calls Eve)
Eve	（しっぽを振る）/ Wags tail	（しっぽを振る）/ Wags tail
	（あくびをする）/ Yawns	（あくびをする）/ Yawns

フォールバック選択ロジック / Fallback Selection Logic

typescript

interface FallbackConfig {
  character: 'john' | 'sara' | 'eve';
  language: 'ja' | 'en';
  context: 'conversation' | 'comment_reply' | 'tip_reaction';
}

function selectFallback(config: FallbackConfig): FallbackAction {
  const pool = FALLBACK_ACTIONS[config.character][config.language];
  // Avoid repeating the same fallback within 30 minutes
  const available = pool.filter(a => !recentlyUsed(a, 30 * 60 * 1000));
  return available[Math.floor(Math.random() * available.length)] || pool[0];
}

リトライポリシー / Retry Policy

Scenario	Retry	Detail
出力NGワード検出	1回リトライ	プロンプトに「以下の表現を避けて再生成してください」を追加してリトライ
出力ポリシー違反	リトライなし	即座にフォールバック行動を実行
リトライ後も再度NG	リトライなし	フォールバック行動を実行
LLM APIタイムアウト	フォールバック	5秒以内に応答がなければフォールバック（既存のリスク対策と連携）

LLM Response
    │
    ▼
Output Filter → PASS → Render
    │
    BLOCK (NG word)
    │
    ▼
Retry with modified prompt (1 time only)
    │
    ▼
Output Filter → PASS → Render
    │
    BLOCK (again)
    │
    ▼
Fallback Action → Render

レートリミッティング（常習犯対策） / Rate Limiting for Repeat Offenders

段階的制裁 / Graduated Penalties

Level	Condition	Action	Duration
Level 0	Normal	通常処理	—
Level 1 Warning	3 NG hits in 10 min	コメント無視（ミュート）	10 min
Level 2 Temp Mute	10 NG hits in 24 hours	コメント全無視	60 min
Level 3 Extended Mute	Level 2 を24時間以内に2回	コメント全無視	24 hours
Level 4 Permanent Mute	Level 3 を3回以上	コメント永久無視	Until manual unblock

ユーザー状態管理 / User State Management

typescript

interface UserModerationState {
  oderId: string;
  platform: 'youtube' | 'tiktok';
  displayName: string;
  ngHits: {
    timestamp: number;
    category: string;
    matchedWord: string;
  }[];
  currentLevel: 0 | 1 | 2 | 3 | 4;
  muteUntil: number | null;
  totalNgHitsAllTime: number;
}

保存先 / Storage

Data	Storage	Retention
リアルタイム判定状態	In-memory (Map)	プロセス生存中
24時間NG履歴	SQLite `user_moderation` table	7日間
永久ミュートリスト	SQLite `permanent_mutes` table	手動解除まで

sql

CREATE TABLE user_moderation (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  user_id TEXT NOT NULL,
  platform TEXT NOT NULL,
  display_name TEXT,
  ng_category TEXT NOT NULL,
  matched_word TEXT NOT NULL,
  original_text TEXT NOT NULL,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE permanent_mutes (
  user_id TEXT NOT NULL,
  platform TEXT NOT NULL,
  display_name TEXT,
  reason TEXT,
  muted_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (user_id, platform)
);

ログ・モニタリング / Logging & Monitoring

ログ出力 / Log Output

全てのフィルタ判定結果をログに記録する。NG検出時は詳細情報を含める。

Log all filter decisions. Include detailed information when NG is detected.

typescript

interface FilterLog {
  timestamp: string;
  direction: 'input' | 'output';
  userId?: string;
  platform?: 'youtube' | 'tiktok';
  originalText: string;
  normalizedText: string;
  result: 'pass' | 'block';
  blockReason?: {
    stage: string;          // e.g., "ng_word_check", "rate_limit"
    category?: string;      // e.g., "violence", "profanity"
    matchedPattern?: string;
    matchType?: 'exact' | 'partial' | 'regex';
    severity?: 'low' | 'medium' | 'high';
  };
  processingTimeMs: number;
}

モニタリングダッシュボード指標 / Monitoring Metrics

Metric	Purpose
NG検出率 / NG detection rate	全コメントに対するNG比率。異常な急上昇は攻撃の可能性
カテゴリ別NG件数 / NG count by category	どのカテゴリのNGが多いかを把握
誤検出報告数 / False positive reports	許可リスト更新の判断材料
出力ブロック率 / Output block rate	LLMの応答品質の指標。高い場合はシステムプロンプト改善を検討
ミュートユーザー数 / Muted user count	コミュニティ健全性の指標
フィルタ処理時間 / Filter latency	パフォーマンスの監視。p95 < 10ms を目標

管理者機能 / Admin Functions

配信者・管理者が利用可能なコマンド。

Commands available to the streamer / admin.

Command	Description
`!ng-add <word> [category]`	NGワードを追加（デフォルトカテゴリ: profanity）
`!ng-remove <word>`	NGワードを削除
`!ng-list [category]`	NGワード一覧表示（カテゴリ指定可）
`!mute <userId>`	ユーザーを手動ミュート
`!unmute <userId>`	ユーザーのミュートを解除
`!filter-stats`	フィルタ統計情報の表示
`!allow-add <word>`	許可リストにワードを追加
`!allow-remove <word>`	許可リストからワードを削除

セキュリティ / Security

管理者コマンドは、事前設定された管理者ユーザーIDからのみ実行可能とする。コマンド自体は配信画面には表示しない。

Admin commands can only be executed from pre-configured admin user IDs. Commands themselves are not displayed on the stream screen.

処理フロー全体まとめ / End-to-End Processing Summary

Viewer Comment
      │
      ├─ Normalize (Unicode, homoglyphs, leet)
      ├─ Rate Limit Check (per user)
      │     └─ MUTED → silent discard
      ├─ NG Word Check (exact → partial → regex)
      │     └─ MATCH → silent discard + log + increment NG count
      ├─ Semantic Filter (PII, links, coded language)
      │     └─ MATCH → silent discard + log
      ├─ Language Detection → tag as "ja" or "en"
      │
      ▼
  Comment Queue ──────────────────────────────────┐
      │                                            │
      ├─ [FREE] → 10-min cycle pickup              │
      └─ [TIP]  → Immediate processing             │
                                                    │
      ┌─ LLM Prompt (with filtered comments) ◄─────┘
      │
      ▼
  LLM Response (Claude API)
      │
      ├─ NG Word Check (same list)
      │     └─ MATCH → retry once with avoidance instruction
      │               └─ Still MATCH → fallback action
      ├─ Content Policy Check
      │     └─ VIOLATION → fallback action (no retry)
      ├─ Length / Format Validation
      │     └─ Over limit → truncate / clean
      │
      ▼
  Render to Stream

既存設計との関連 / Relationship to Existing Design

Document	Relevance
risks.md	Content Risk (B) で定義されたNGフィルタ・トロールコメント対策の詳細実装
live-comment.md	Comment Processing Flow の「Filter & Classifier」ステージの詳細設計
Agent System Prompts	各キャラクターの「Important Rules」にある安全性ルールとの整合性確保
Memory System	NGフィルタのログはメモリシステムとは独立して管理（記憶に残さない）

NGワード・コンテンツフィルタ詳細設計 / NG Word & Content Filter Design ​

概要 / Overview ​

アーキテクチャ全体図 / Architecture Overview ​

入力フィルタパイプライン / Input Filter Pipeline ​

Stage 1: Unicode 正規化 / Unicode Normalization ​

Stage 2: レートリミッター / Rate Limiter ​

Stage 3: NGワード判定 / NG Word Check ​

Stage 4: 意味的フィルタ / Semantic Filter ​

Stage 5: 言語判定 / Language Detection ​

出力フィルタパイプライン / Output Filter Pipeline ​

Stage 1: NGワード判定 / NG Word Check ​

Stage 2: コンテンツポリシーチェック / Content Policy Check ​

Stage 3: 長さ・フォーマット検証 / Length & Format Validation ​

Stage 4: フォールバックハンドラ / Fallback Handler ​

NGワードリストの管理 / NG Word List Management ​

カテゴリ分類 / Categories ​

データ構造 / Data Structure ​

ファイル構成 / File Structure ​

管理方針 / Management Policy ​

NGワード判定パターン / Matching Patterns ​

3つの判定方式 / Three Matching Methods ​

1. 完全一致 / Exact Match ​

2. 部分一致 / Partial Match ​

3. 正規表現 / Regex Match ​

判定フロー / Matching Flow ​

パフォーマンス考慮 / Performance Considerations ​

多言語対応 / Multi-Language Support (JP + EN) ​

言語別処理 / Language-Specific Processing ​

日本語特有の処理 / Japanese-Specific Processing ​

英語特有の処理 / English-Specific Processing ​

言語混合コメントの処理 / Mixed-Language Comment Handling ​

フォールバック行動 / Fallback Behaviors ​

入力フィルタでのブロック時 / When Input Filter Blocks ​

投げ銭コメントの汎用お礼 / Generic Tip Thank-You Messages ​

出力フィルタでのブロック時 / When Output Filter Blocks ​

代替行動テーブル / Fallback Action Table ​

フォールバック選択ロジック / Fallback Selection Logic ​

リトライポリシー / Retry Policy ​

レートリミッティング（常習犯対策） / Rate Limiting for Repeat Offenders ​

段階的制裁 / Graduated Penalties ​

ユーザー状態管理 / User State Management ​

保存先 / Storage ​

ログ・モニタリング / Logging & Monitoring ​

ログ出力 / Log Output ​

モニタリングダッシュボード指標 / Monitoring Metrics ​

管理者機能 / Admin Functions ​

処理フロー全体まとめ / End-to-End Processing Summary ​

既存設計との関連 / Relationship to Existing Design ​

NGワード・コンテンツフィルタ詳細設計 / NG Word & Content Filter Design

概要 / Overview

アーキテクチャ全体図 / Architecture Overview

入力フィルタパイプライン / Input Filter Pipeline

Stage 1: Unicode 正規化 / Unicode Normalization

Stage 2: レートリミッター / Rate Limiter

Stage 3: NGワード判定 / NG Word Check

Stage 4: 意味的フィルタ / Semantic Filter

Stage 5: 言語判定 / Language Detection

出力フィルタパイプライン / Output Filter Pipeline

Stage 1: NGワード判定 / NG Word Check

Stage 2: コンテンツポリシーチェック / Content Policy Check

Stage 3: 長さ・フォーマット検証 / Length & Format Validation

Stage 4: フォールバックハンドラ / Fallback Handler

NGワードリストの管理 / NG Word List Management

カテゴリ分類 / Categories

データ構造 / Data Structure

ファイル構成 / File Structure

管理方針 / Management Policy

NGワード判定パターン / Matching Patterns

3つの判定方式 / Three Matching Methods

1. 完全一致 / Exact Match

2. 部分一致 / Partial Match

3. 正規表現 / Regex Match

判定フロー / Matching Flow

パフォーマンス考慮 / Performance Considerations

多言語対応 / Multi-Language Support (JP + EN)

言語別処理 / Language-Specific Processing

日本語特有の処理 / Japanese-Specific Processing

英語特有の処理 / English-Specific Processing

言語混合コメントの処理 / Mixed-Language Comment Handling

フォールバック行動 / Fallback Behaviors

入力フィルタでのブロック時 / When Input Filter Blocks

投げ銭コメントの汎用お礼 / Generic Tip Thank-You Messages

出力フィルタでのブロック時 / When Output Filter Blocks

代替行動テーブル / Fallback Action Table

フォールバック選択ロジック / Fallback Selection Logic

リトライポリシー / Retry Policy

レートリミッティング（常習犯対策） / Rate Limiting for Repeat Offenders

段階的制裁 / Graduated Penalties

ユーザー状態管理 / User State Management

保存先 / Storage

ログ・モニタリング / Logging & Monitoring

ログ出力 / Log Output

モニタリングダッシュボード指標 / Monitoring Metrics

管理者機能 / Admin Functions

処理フロー全体まとめ / End-to-End Processing Summary

既存設計との関連 / Relationship to Existing Design