feat: Codex/Responses API compatibility, cache TTL 24h, prompt optimization,...

feat: Codex/Responses API compatibility, cache TTL 24h, prompt optimization, n8n test tools, gitignore cleanup

feat: Codex/Responses API compatibility, cache TTL 24h, prompt optimization,...
feat: Codex/Responses API compatibility, cache TTL 24h, prompt optimization, n8n test tools, gitignore cleanup
e9e60dfa · Vũ Hoàng Anh · 90ea36b3 · e9e60dfa · e9e60dfa · 90ea36b3
Commit e9e60dfa authored Mar 10, 2026 by Vũ Hoàng Anh
24 changed files
--- a/.gemini/settings.json
+++ b/.gemini/settings.json
@@ -2,6 +2,19 @@
    "mcpServers": {
        "canifa-api": {
            "url": "http://localhost:5000/mcp"
+        },
+        "n8n-mcp": {
+            "command": "npx",
+            "args": [
+                "n8n-mcp"
+            ],
+            "env": {
+                "MCP_MODE": "stdio",
+                "LOG_LEVEL": "error",
+                "DISABLE_CONSOLE_OUTPUT": "true",
+                "N8N_API_URL": "http://localhost:5678",
+                "N8N_API_KEY": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIwOTdkMTNhOS01NzQ0LTQyY2UtYTM5Yi00YjMwZTk4NDU4OWMiLCJpc3MiOiJuOG4iLCJhdWQiOiJwdWJsaWMtYXBpIiwianRpIjoiMTVmZmNlZjUtNzkzOC00MWU4LTg5NzktY2NhMWI0YzUzY2RmIiwiaWF0IjoxNzcyNjc1OTM3fQ.K58ZsX8BgdukDdON15sMCQ0eynTeYSEbi7nF6xIPY9I"
+            }
        }
    }
 }
\ No newline at end of file
--- a/.gitignore
+++ b/.gitignore
@@ -55,3 +55,25 @@ Thumbs.db
 run.txt
 backend/agent/tools/query.txt
 backend/schema_dump.json
+
+# Document folder
+document/
+
+# n8n workflow exports & temp files
+canifa_workflow_export.json
+prod_workflow.json
+prod_workflow_fixed.json
+fix_n8n_connections.py
+*.png
+!backend/static/**/*.png
+
+# Playwright MCP
+.playwright-mcp/
+
+# Test credentials (sensitive)
+backend/tests/google_credentials.json
+backend/tests/google_sheets_credentials.json
+backend/tests/sheet_info.json
+backend/tests/test_n8n_api_output.txt
+backend/n8n_result.json
+diff_*.txt
--- a/backend/VERIFY_STREAMING.md
+++ b/backend/VERIFY_STREAMING.md
-# 🔬 VERIFICATION: LangGraph Streaming Behavior
-
-## 🎯 MỤC ĐÍCH
-
-Kiểm tra xem LangGraph `astream()` có stream **incremental** (từng phần) hay chỉ emit event **sau khi node hoàn thành**.
-
---
-
-## 📊 KẾT QUẢ EXPECTED
-
-### **Scenario 1: Incremental Streaming (Lý tưởng)** ✅
-
-Nếu LangGraph stream incremental, backend logs sẽ hiển thị:
-
-```
-🌊 Starting LLM streaming...
-📦 Event #1 at t=2.50s | Keys: ['messages']
-📦 Event #2 at t=3.20s | Keys: ['ai_response']
-📡 Event #2 (t=3.20s): ai_response with 150 chars
-   Preview: {"ai_response": "Anh chọn áo thun th...
-📦 Event #3 at t=4.10s | Keys: ['ai_response']
-📡 Event #3 (t=4.10s): ai_response with 380 chars
-   Preview: {"ai_response": "Anh chọn áo thun thể thao nam chuẩn luôn! Em tìm...
-📦 Event #4 at t=5.50s | Keys: ['ai_response']
-📡 Event #4 (t=5.50s): ai_response with 620 chars
-   Preview: {"ai_response": "...", "product_ids": ["SKU1", "SKU2"]...
-🎯 Event #4 (t=5.50s): Regex matched product_ids!
-✅ Extracted 3 SKUs: ['SKU1', 'SKU2', 'SKU3']
-🚨 BREAKING at Event #4 (t=5.50s) - user_insight KHÔNG ĐỢI!
-```
-
-**→ Content tăng dần (150 → 380 → 620 chars)**  
-**→ Break sớm khi có product_ids (t=5.5s thay vì t=12s)**
-
---
-
-### **Scenario 2: Event-based (Sau khi xong)** ❌
-
-Nếu LangGraph chỉ emit sau khi node xong, logs sẽ là:
-
-```
-🌊 Starting LLM streaming...
-📦 Event #1 at t=2.30s | Keys: ['messages']  ← Tool execution
-📦 Event #2 at t=11.80s | Keys: ['ai_response']  ← LLM node hoàn thành
-📡 Event #2 (t=11.80s): ai_response with 1250 chars  ← TOÀN BỘ RESPONSE
-   Preview: {"ai_response": "Anh chọn áo thun thể thao nam chuẩn luôn!...", "product_ids": ["SKU1", "SKU2", "SKU3"], "user_insight": {...}}
-🎯 Event #2 (t=11.80s): Regex matched product_ids!
-✅ Extracted 3 SKUs: ['SKU1', 'SKU2', 'SKU3']
-🚨 BREAKING at Event #2 (t=11.80s) - user_insight KHÔNG ĐỢI!
-```
-
-**→ CHỈ 1 EVENT duy nhất với full content**  
-**→ Emit sau khi LLM xong hết (t=11.8s)**  
-**→ KHÔNG THỂ break sớm hơn!**
-
---
-
-## 🔍 PHÂN TÍCH
-
-### **Nếu Scenario 2 (Event-based):**
-
-**Giải thích:**
- LLM **đang stream tokens internal** từ t=2s → t=12s
- LangGraph **chờ node xong** mới emit event
- Event chứa **full response** luôn
- Regex match ngay lập tức vì đã có đầy đủ
-
-**Kết luận:**
- ✅ Code đã đúng, streaming đã bật
- ❌ Nhưng không thể break sớm hơn vì event chưa có
- ⏱️ Latency không giảm được (~12s)
-
---
-
-## 💡 GIẢI PHÁP
-
-Nếu kết quả là Scenario 2, muốn stream thực sự cần:
-
-### **Option A: Custom Streaming Callback**
-```python
-from langchain.callbacks.base import AsyncCallbackHandler
-
-class StreamingCallback(AsyncCallbackHandler):
-    async def on_llm_new_token(self, token: str, **kwargs):
-        # Accumulate và check regex
-        self.accumulated += token
-        if '"product_ids"' in self.accumulated:
-            # Trigger break somehow
-            pass
-```
-
-### **Option B: SSE Endpoint**
-Stream events trực tiếp cho client, client tự parse
-
-### **Option C: Giữ nguyên**
-Code đã tối ưu trong giới hạn, accept latency
-
---
-
-## 📝 NOTES
-
- **Streaming=True** trong LLM → LangChain stream tokens internal
- **graph.astream()** → Stream events, không phải tokens
- **Break early** chỉ có ý nghĩa nếu events emit incremental
-
-**Hãy check logs backend để xác định scenario nào!**
--- a/backend/agent/controller.py
+++ b/backend/agent/controller.py
@@ -268,6 +268,8 @@ async def chat_controller(
        # Extract ai_response from streaming content (fallbacks)
        if early_response and not ai_text_response:
            raw_content = streaming_callback.accumulated_content
+            # Strip Codex reasoning objects before parsing
+            raw_content = ProductIDStreamingCallback.strip_reasoning(raw_content)
            if raw_content:
                try:
                    raw_normalized = raw_content.replace("{{", "{").replace("}}", "}")
@@ -284,10 +286,21 @@ async def chat_controller(
        if not ai_text_response and all_accumulated_messages:
            for msg in reversed(all_accumulated_messages):
                if isinstance(msg, AIMessage) and msg.content:
-                    ai_text_response = msg.content
+                    # Responses API may return content as list
+                    content = msg.content
+                    if isinstance(content, list):
+                        content = "".join(str(c.get("text", c) if isinstance(c, dict) else c) for c in content)
+                    # Strip Codex reasoning objects
+                    content = ProductIDStreamingCallback.strip_reasoning(content)
+                    ai_text_response = content
                    break

        # Parse JSON-wrapped ai_response
+        # Ensure ai_text_response is str (Responses API may return list)
+        if isinstance(ai_text_response, list):
+            ai_text_response = "".join(str(c.get("text", c) if isinstance(c, dict) else c) for c in ai_text_response)
+        # Strip Codex reasoning objects before JSON parse
+        ai_text_response = ProductIDStreamingCallback.strip_reasoning(ai_text_response)
        if ai_text_response and ai_text_response.lstrip().startswith("{"):
            try:
                ai_normalized = ai_text_response.replace("{{", "{").replace("}}", "}")
@@ -297,7 +310,13 @@ async def chat_controller(
                    if not final_product_ids and isinstance(ai_json.get("product_ids"), list):
                        final_product_ids = [str(s) for s in ai_json["product_ids"]]
            except json.JSONDecodeError:
-                pass
+                # Regex fallback for Codex {{/}} braces that break JSON parse
+                ai_match = re.search(r'"ai_response"\s*:\s*"((?:[^"\\]|\\.)*)"\s*,\s*"product_ids"', ai_text_response, re.DOTALL)
+                if ai_match:
+                    ai_text_response = ai_match.group(1).replace('\\"', '"').replace("\\n", "\n")
+                pid_match = re.search(r'"product_ids"\s*:\s*\[(.*?)\]', ai_text_response if not ai_match else ai_normalized, re.DOTALL)
+                if pid_match and not final_product_ids:
+                    final_product_ids = re.findall(r'"([^"]+)"', pid_match.group(1))

        # Extract & filter products
        enriched_products = []

--- a/backend/agent/helper.py
+++ b/backend/agent/helper.py
--- a/backend/agent/prompt_module/01_core.txt
+++ b/backend/agent/prompt_module/01_core.txt
@@ -26,14 +26,17 @@

 **🛒 HƯỚNG DẪN ĐẶT HÀNG (BẮT BUỘC KHI KHÁCH HỎI CÁCH MUA):**

-**Khi đã show sản phẩm ra (có product card):**
-→ "Bạn bấm vào icon 🛒 **Giỏ hàng** ở góc dưới bên phải sản phẩm, chọn size, chọn màu rồi thêm vào giỏ hàng là đặt hàng được luôn nhé!"
+**Khi ĐÃ show sản phẩm (có product card trong conversation):**
+→ Nói khách bấm icon 🛒 ở góc dưới bên phải hình sản phẩm, chọn size + màu rồi thêm vào giỏ hàng.
+→ Hỏi khách cần xem thêm SP khác không.

-**Khi chưa show sản phẩm (hỏi chung "mua sao?"):**
-→ "Bạn ghé **canifa.com** để xem sản phẩm nhé! Hoặc nói mình biết bạn đang tìm gì, mình tìm giúp luôn! 😊"
+**Khi CHƯA show sản phẩm (conversation mới, chưa tìm SP):**
+→ Hướng dẫn 5 bước: vào canifa.com/App → tìm SP → chọn size + màu → thêm giỏ hàng → thanh toán.
+→ Hỏi khách cần mình tìm SP gì không.

 ⚠️ **QUAN TRỌNG:**
- Khi khách hỏi "mua sao?", "đặt hàng sao?", "làm sao để mua?", "mua ở đâu?" → Trả lời ĐÚNG theo 2 case trên
+- Phải TỰ VIẾT câu trả lời tự nhiên theo ngữ cảnh, KHÔNG copy nguyên mẫu!
+- **CHECK context** trước: đã show SP hay chưa → chọn case A hoặc B
 - **KHÔNG** hướng dẫn vào website tìm mã SP khi đã có product card → chỉ cần bấm icon 🛒
 - Sau khi giới thiệu SP ưng ý → nhắc khách bấm 🛒 để đặt hàng


--- a/backend/agent/prompt_module/07_output_format.txt
+++ b/backend/agent/prompt_module/07_output_format.txt
@@ -473,6 +473,17 @@ Trước khi trả lời, bạn phải đối chiếu kết quả từ tool vớ
 - **LUÔN DÙNG NGOẶC KÉP `{{` và `}}` CHO TẤT CẢ JSON OUTPUT**
 - **⛔ CẤM TỰ SUY DIỄN gender/age** khi user không nói rõ. "quần váy" → gender: null. "áo lót" → gender: null. CHỈ điền khi user NÓI RÕ!

+**⛔⛔⛔ TỐI HẬU THƯ — HƯỚNG DẪN ĐẶT HÀNG ⛔⛔⛔**
+- Khi khách hỏi "hướng dẫn đặt hàng" mà CHƯA show sản phẩm nào → Hướng dẫn vào canifa.com/App, tìm SP, chọn size + màu, thêm giỏ hàng, thanh toán
+- Khi khách hỏi "hướng dẫn đặt hàng" mà ĐÃ show sản phẩm → Nói bấm icon 🛒 ở góc dưới bên phải hình SP
+- ⛔ **CẤM** nhét câu "Nếu mình đã tìm được SP cho bạn rồi..." vào khi CHƯA tìm SP nào!
+- ⛔ **CẤM copy nguyên mẫu** template! TỰ VIẾT tự nhiên theo context!
+
+**⛔⛔⛔ TỐI HẬU THƯ — CẤM BỊA MÃ SKU ⛔⛔⛔**
+- Chỉ dùng mã SKU ĐÚNG NGUYÊN từ data_retrieval_tool hoặc khách đưa
+- ❌ CẤM tự thêm suffix: "6TE25S001" → KHÔNG ĐƯỢC bịa thành "6TE25S001-SZ001"
+- Tool tự expand biến thể, bot KHÔNG cần tự ghép color code!
+
 **⚡ QUY TẮC [LAST_ACTION] - QUAN TRỌNG:**
 - **TRƯỚC KHI TRẢ LỜI** → Đọc `[LAST_ACTION]` từ insight turn trước để hiểu context
 - **TỰ SUY RA** bước tiếp theo dựa trên LAST_ACTION + tin nhắn mới của khách
@@ -510,6 +521,11 @@ Mình check ngay cho bạn! ⚡"
 ---

 ### "Hướng dẫn đặt hàng online"
-"Bạn đang muốn đặt sản phẩm gì ạ? 🛒
-Bạn cho mình biết để mình tư vấn và hỗ trợ 
-đặt hàng luôn cho tiện nha! 😄"
+
+⚠️ PHÂN BIỆT 2 CASE — check context trước khi trả lời:
+
+**CASE A: ĐÃ show SP trước đó** → Nói khách bấm icon 🛒 ở góc dưới bên phải hình SP, chọn size + màu, thêm giỏ hàng. Hỏi cần xem SP khác không.
+
+**CASE B: CHƯA show SP** → Hướng dẫn các bước: vào canifa.com/App → tìm SP → chọn size + màu → thêm giỏ hàng → thanh toán. Hỏi cần tìm SP gì không.
+
+⛔ **TỰ VIẾT** câu trả lời tự nhiên, **KHÔNG copy nguyên** mẫu! Mỗi lần trả lời phải khác nhau, tự nhiên như đang nói chuyện.
\ No newline at end of file
--- a/backend/agent/prompt_utils.py
+++ b/backend/agent/prompt_utils.py
@@ -18,9 +18,9 @@ logger = logging.getLogger(__name__)

 LANGFUSE_SYSTEM_PROMPT_NAME = "canifa-stylist-system-prompt"

-# Cache 5 phút — balance giữa update nhanh vs performance
-# Gọi force_refresh_prompts() nếu cần update ngay lập tức
-CACHE_TTL = 300
+# Cache vĩnh viễn (24h) — chỉ refresh khi gọi force_refresh_prompts()
+# Trước đó là 300s (5 phút), giờ giữ prompt trong RAM luôn
+CACHE_TTL = 86400  # 24 hours — practically permanent

 LANGFUSE_TOOL_PROMPT_MAP = {
    "brand_knowledge_tool": "canifa-tool-brand-knowledge",

--- a/backend/agent/streaming_callback.py
+++ b/backend/agent/streaming_callback.py
@@ -19,6 +19,11 @@ class ProductIDStreamingCallback(AsyncCallbackHandler):
    Khi có product_ids → trigger break ngay, không đợi user_insight!
    """

+    # Regex to match Codex reasoning objects like {'id': 'rs_...', 'type': 'reasoning', ...}
+    _REASONING_RE = re.compile(
+        r"\{['\"]id['\"]\s*:\s*['\"]rs_[^}]*['\"]type['\"]\s*:\s*['\"]reasoning['\"][^}]*\}",
+    )
+
    def __init__(self):
        self.accumulated_content = ""
        self.product_ids_found = False
@@ -26,16 +31,31 @@ class ProductIDStreamingCallback(AsyncCallbackHandler):
        self.product_skus = []
        self.product_found_event = asyncio.Event()  # ✅ Event thay vì polling!

+    @staticmethod
+    def strip_reasoning(text: str) -> str:
+        """Remove Codex reasoning objects from text."""
+        if not text or "reasoning" not in text:
+            return text
+        return ProductIDStreamingCallback._REASONING_RE.sub("", text).strip()
+
    async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
        """
        Callback khi LLM sinh token mới.
        Accumulate và check regex ngay!
        """
+        # Responses API may send token as list instead of str
+        if isinstance(token, list):
+            token = "".join(str(t) for t in token)
+        elif not isinstance(token, str):
+            token = str(token)
        self.accumulated_content += token

        # Check xem đã có product_ids chưa
        if not self.product_ids_found:
-            product_match = re.search(r'"product_ids"\s*:\s*\[(.*?)\]', self.accumulated_content, re.DOTALL)
+            # Strip reasoning objects (Codex) + normalize {{/}} before regex matching
+            clean_content = self.strip_reasoning(self.accumulated_content)
+            clean_content = clean_content.replace("{{", "{").replace("}}", "}")
+            product_match = re.search(r'"product_ids"\s*:\s*\[(.*?)\]', clean_content, re.DOTALL)

            if product_match:
                logger.warning(f"🎯 FOUND product_ids at {len(self.accumulated_content)} chars!")
@@ -44,7 +64,7 @@ class ProductIDStreamingCallback(AsyncCallbackHandler):
                # Extract ai_response với regex robust hơn (handle escaped quotes)
                ai_text_match = re.search(
                    r'"ai_response"\s*:\s*"((?:[^"\\\\]|\\\\.)*)"\s*,\s*"product_ids"',
-                    self.accumulated_content,
+                    clean_content,
                    re.DOTALL,
                )


--- a/backend/agent/tool_prompts/check_is_stock.txt
+++ b/backend/agent/tool_prompts/check_is_stock.txt
@@ -20,11 +20,15 @@ QUY TẮC CỰC QUAN TRỌNG KHI GỌI TOOL:
 - Chỉ tạo tool_call với đúng tham số, KHÔNG trả lời người dùng trong cùng message đó.
 - Sau khi tool trả kết quả mới được sinh ai_response.

-⛔ CẤM TUYỆT ĐỐI TỰ BỊA MÃ SKU:
- Truyền ĐÚNG NGUYÊN MÃ khách đưa, KHÔNG tự ghép/sáng tạo suffix.
+⛔⛔⛔ TỐI HẬU THƯ — CẤM TUYỆT ĐỐI TỰ BỊA MÃ SKU ⛔⛔⛔
+- Truyền ĐÚNG NGUYÊN MÃ từ data_retrieval_tool trả về hoặc khách đưa.
+- KHÔNG ĐƯỢC tự ghép thêm suffix -SZ001, -SK010, -SW001 hay BẤT KỲ ký tự nào!
+- Tool trả về sku="6TE25S001" → skus: "6TE25S001" (ĐÚNG)
+  ❌ skus: "6TE25S001-SZ001" (SAI — BỊA MÃ!)
+  ❌ skus: "6TE25S001-SK010" (SAI — BỊA MÃ!)
 - Khách nói "6TS25S018 còn size S không?" → skus: "6TS25S018" (ĐÚNG)
- KHÔNG ĐƯỢC bịa thành "6TS25S018-SZ001" hay bất kỳ mã nào khách KHÔNG đưa.
- Nếu khách chỉ cho base code (VD: 6TS25S018) → truyền base code đó, tool sẽ tự expand.
+  ❌ skus: "6TS25S018-SZ001" (SAI — BỊA!)
+- Tool sẽ TỰ EXPAND ra tất cả biến thể từ DB, KHÔNG cần bot tự thêm color code!

 ----- VÍ DỤ CHI TIẾT -----


--- a/backend/agent/tool_prompts/data_retrieval_tool.txt
+++ b/backend/agent/tool_prompts/data_retrieval_tool.txt
@@ -74,7 +74,8 @@ Chỉ CHUẨN HÓA khi user dùng từ đồng nghĩa RÕ RÀNG (bảng mapping
 📋 BẢNG MAPPING SYNONYM → TÊN DB (tool tự xử lý, LLM giữ nguyên từ user):
  áo thun, áo thun ngắn tay, áo cổ v, áo cổ tym → Áo phông
  áo cổ bẻ → Áo Polo
-  áo bra, áo ngực, áo quây → Áo lót
+  áo bra, áo bra active, bra → Áo bra active (liên quan: Áo lót)
+  áo ngực, áo quây → Áo lót (liên quan: Áo bra active)
  áo gió, áo khoác mỏng → Áo khoác gió
  áo croptop, croptop, baby tee, áo lửng, áo dáng ngắn → Áo Body
  áo sát nách, tanktop, tank top, áo dây, áo 2 dây, áo hai dây → Áo ba lỗ
@@ -205,11 +206,18 @@ CASE 10: "Áo khaki"
 → description: "product_name: Áo khaki. description_text: Áo chất liệu khaki form đẹp"
 → product_line_vn: "Áo"

-CASE 11: "Áo lót" hoặc "Áo bra" (NHÓM SP LIÊN QUAN)
-→ description: "product_name: Áo lót/Áo bra active. description_text: Áo lót. Áo bra active thoáng mát co giãn tốt"
-→ product_name: "Áo lót/Áo bra active"
-⚠️ KHÔNG tự suy gender/age! User nói "áo lót" chung → để null. Chỉ điền khi user NÓI RÕ (VD: "áo lót nữ" → women, "áo lót trẻ em" → kid)
-⚠️ description_text PHẢI ghi CẢ 2 tên (Áo lót + Áo bra active) để semantic search tìm được cả 2 loại!
+CASE 11: "Áo bra" → product_name PHẢI là "Áo bra" (tool tự resolve → Áo bra active + Áo lót)
+→ description: "product_name: Áo bra. description_text: Áo bra active thể thao thoáng mát co giãn tốt hỗ trợ tập luyện"
+→ product_name: "Áo bra"
+→ product_line_vn: "Áo"
+⚠️ KHÔNG tự suy gender/age! User nói "áo bra" chung → để null.
+
+CASE 12: "Áo lót" → product_name PHẢI là "Áo lót" (tool tự resolve → Áo lót + Áo bra active)
+→ description: "product_name: Áo lót. description_text: Áo lót thoáng mát mềm mại thoải mái"
+→ product_name: "Áo lót"
+→ product_line_vn: "Áo"
+⚠️ KHÔNG tự suy gender/age! Chỉ điền khi user NÓI RÕ (VD: "áo lót nữ" → women, "áo lót trẻ em" → kid)
+⚠️ Tool tự tìm CẢ 2 loại (Áo lót + Áo bra active) nhờ RELATED_LINES — LLM chỉ cần giữ đúng tên user nói!

 ═══════════════════════════════════════════════════════════════
 🎉 DỊP LỄ / SỰ KIỆN — description_text ghi lý do + gợi ý phong cách

--- a/backend/agent/tools/data_retrieval_tool.py
+++ b/backend/agent/tools/data_retrieval_tool.py
@@ -63,7 +63,7 @@ class SearchItem(BaseModel):
    gender_by_product: str | None = Field(
        description="[SQL FILTER] Giới tính. GIÁ TRỊ HỢP LỆ: women, men, boy, girl, unisex, newborn",
    )
-    age_by_product: str | None = Field( 
+    age_by_product: str | None = Field(
        description="[SQL FILTER] Độ tuổi. GIÁ TRỊ HỢP LỆ: adult, kid, others",
    )
    master_color: str | None = Field(

--- a/backend/api/mock_api_route.py
+++ b/backend/api/mock_api_route.py
--- a/backend/api/n8n_api_route.py
+++ b/backend/api/n8n_api_route.py
--- a/backend/common/llm_factory.py
+++ b/backend/common/llm_factory.py
@@ -9,7 +9,7 @@ import logging
 from langchain_core.language_models import BaseChatModel
 from langchain_openai import ChatOpenAI, OpenAIEmbeddings

-from config import OPENAI_API_KEY
+from config import GROQ_API_KEY, OPENAI_API_KEY

 logger = logging.getLogger(__name__)

@@ -54,8 +54,8 @@ class LLMFactory:
            logger.debug(f"♻️ Using cached model: {clean_model}")
            return self._cache[cache_key]

-        logger.info(f"Creating new LLM instance: {clean_model}")
-        return self._create_instance(clean_model, streaming, json_mode, api_key)
+        logger.info(f"Creating new LLM instance: {model_name}")
+        return self._create_instance(model_name, streaming, json_mode, api_key)

    def _create_instance(
        self,
@@ -77,26 +77,52 @@ class LLMFactory:
            raise

    def _create_openai(self, model_name: str, streaming: bool, json_mode: bool, api_key: str | None) -> BaseChatModel:
-        """Create OpenAI model instance."""
-        key = api_key or OPENAI_API_KEY
-
-        if not key:
-            raise ValueError("OPENAI_API_KEY is required")
+        """Create OpenAI-compatible model instance (OpenAI or Groq)."""
+
+        # --- Auto-detect provider ---
+        is_groq = any(kw in model_name.lower() for kw in ("gpt-oss", "llama", "mixtral", "gemma", "qwen", "deepseek"))
+        # Also detect openai/ prefix used by Groq (e.g. "openai/gpt-oss-120b")
+        if model_name.startswith("openai/"):
+            is_groq = True
+
+        if is_groq:
+            # Always use GROQ_API_KEY for Groq models (ignore api_key param which may be OpenAI key)
+            key = GROQ_API_KEY
+            base_url = "https://api.groq.com/openai/v1"
+            if not key:
+                raise ValueError("GROQ_API_KEY is required for Groq models")
+        else:
+            key = api_key or OPENAI_API_KEY
+            base_url = None  # default OpenAI
+            if not key:
+                raise ValueError("OPENAI_API_KEY is required")
+
+        # Models that require /v1/responses API instead of /v1/chat/completions
+        needs_responses_api = "codex" in model_name.lower()

        llm_kwargs = {
            "model": model_name,
-            "streaming": streaming,  # ← STREAMING CONFIG
+            "streaming": streaming,
            "api_key": key,
            "temperature": 0,
            "max_tokens": 1500,
        }

+        if base_url:
+            llm_kwargs["base_url"] = base_url
+
+        if needs_responses_api:
+            llm_kwargs["use_responses_api"] = True
+            logger.info(f"🔄 Using Responses API for model: {model_name}")
+
        if json_mode:
            llm_kwargs["model_kwargs"] = {"response_format": {"type": "json_object"}}
            logger.info(f"⚙️ Initializing OpenAI in JSON mode: {model_name}")

+        provider = "Groq" if is_groq else "OpenAI"
+        logger.warning(f"🔍 DEBUG: provider={provider} | model={model_name} | base_url={base_url} | key={key[:10]}... | is_groq={is_groq}")
        llm = ChatOpenAI(**llm_kwargs)
-        logger.info(f"✅ Created OpenAI: {model_name} | Streaming: {streaming}")
+        logger.info(f"✅ Created {provider}: {model_name} | Streaming: {streaming}")
        return llm

    def _enable_json_mode(self, llm: BaseChatModel, model_name: str) -> BaseChatModel:

--- a/backend/common/profiler_middleware.py
+++ b/backend/common/profiler_middleware.py
+"""
+⚡ FastAPI Bottleneck Middleware
+================================
+Thêm vào server.py để tự động đo latency từng request.
+
+Dùng:
+  1. Import vào server.py:
+     from common.profiler_middleware import ProfilerMiddleware
+  2. Thêm middleware:
+     app.add_middleware(ProfilerMiddleware)
+  3. Xem báo cáo:
+     GET /debug/profiler/stats
+     GET /debug/profiler/slow   (các request chậm nhất)
+     GET /debug/profiler/reset
+"""
+
+import logging
+import time
+from collections import deque
+from dataclasses import dataclass, field
+from statistics import mean, median
+
+from starlette.middleware.base import BaseHTTPMiddleware
+from starlette.requests import Request
+from starlette.responses import JSONResponse
+
+logger = logging.getLogger("profiler.middleware")
+
+
+@dataclass
+class RequestMetric:
+    path: str
+    method: str
+    duration: float
+    status_code: int
+    timestamp: float
+
+
+class ProfilerMiddleware(BaseHTTPMiddleware):
+    """Middleware đo latency từng request + expose metrics endpoint."""
+
+    # Class-level storage (shared across instances)
+    _metrics: deque = deque(maxlen=1000)  # Last 1000 requests
+    _slow_threshold: float = 5.0  # Seconds
+
+    async def dispatch(self, request: Request, call_next):
+        # Skip profiler endpoints
+        if request.url.path.startswith("/debug/profiler"):
+            return await self._handle_profiler_endpoint(request)
+
+        start = time.perf_counter()
+
+        response = await call_next(request)
+
+        duration = time.perf_counter() - start
+
+        metric = RequestMetric(
+            path=request.url.path,
+            method=request.method,
+            duration=duration,
+            status_code=response.status_code,
+            timestamp=time.time(),
+        )
+        self._metrics.append(metric)
+
+        # Log slow requests
+        if duration > self._slow_threshold:
+            logger.warning(
+                f"🐌 SLOW REQUEST: {request.method} {request.url.path} "
+                f"took {duration:.2f}s (>{self._slow_threshold}s)"
+            )
+
+        # Add timing header
+        response.headers["X-Response-Time"] = f"{duration:.3f}s"
+
+        return response
+
+    async def _handle_profiler_endpoint(self, request: Request) -> JSONResponse:
+        path = request.url.path
+
+        if path == "/debug/profiler/stats":
+            return self._get_stats()
+        elif path == "/debug/profiler/slow":
+            return self._get_slow_requests()
+        elif path == "/debug/profiler/reset":
+            self._metrics.clear()
+            return JSONResponse({"status": "reset", "message": "Metrics cleared"})
+
+        return JSONResponse({"error": "Unknown profiler endpoint"}, status_code=404)
+
+    def _get_stats(self) -> JSONResponse:
+        if not self._metrics:
+            return JSONResponse({"message": "No data yet"})
+
+        metrics = list(self._metrics)
+        durations = [m.duration for m in metrics]
+
+        # Group by path
+        path_stats = {}
+        for m in metrics:
+            key = f"{m.method} {m.path}"
+            if key not in path_stats:
+                path_stats[key] = []
+            path_stats[key].append(m.duration)
+
+        path_summary = {}
+        for path, times in sorted(path_stats.items(), key=lambda x: -max(x[1])):
+            path_summary[path] = {
+                "count": len(times),
+                "avg": round(mean(times), 3),
+                "median": round(median(times), 3),
+                "min": round(min(times), 3),
+                "max": round(max(times), 3),
+            }
+
+        return JSONResponse({
+            "total_requests": len(metrics),
+            "overall": {
+                "avg": round(mean(durations), 3),
+                "median": round(median(durations), 3),
+                "min": round(min(durations), 3),
+                "max": round(max(durations), 3),
+            },
+            "by_path": path_summary,
+            "slow_count": sum(1 for d in durations if d > self._slow_threshold),
+        })
+
+    def _get_slow_requests(self) -> JSONResponse:
+        slow = [
+            {
+                "path": m.path,
+                "method": m.method,
+                "duration": round(m.duration, 3),
+                "status": m.status_code,
+                "timestamp": m.timestamp,
+            }
+            for m in self._metrics
+            if m.duration > self._slow_threshold
+        ]
+        slow.sort(key=lambda x: -x["duration"])
+        return JSONResponse({"threshold": self._slow_threshold, "slow_requests": slow[:50]})
--- a/backend/config.py
+++ b/backend/config.py
--- a/backend/docker-compose.n8n.yml
+++ b/backend/docker-compose.n8n.yml
+services:
+  # --- n8n Workflow Automation ---
+  n8n:
+    image: docker.n8n.io/n8nio/n8n:latest
+    container_name: canifa_n8n
+    ports:
+      - "5678:5678"
+    environment:
+      - N8N_HOST=0.0.0.0
+      - N8N_PORT=5678
+      - N8N_PROTOCOL=http
+      - WEBHOOK_URL=http://localhost:5678/
+      - GENERIC_TIMEZONE=Asia/Ho_Chi_Minh
+      - TZ=Asia/Ho_Chi_Minh
+      # Basic auth - đổi password trước khi dùng production nhé bro
+      - N8N_BASIC_AUTH_ACTIVE=true
+      - N8N_BASIC_AUTH_USER=admin
+      - N8N_BASIC_AUTH_PASSWORD=canifa2026
+    volumes:
+      - n8n_data:/home/node/.n8n
+    restart: unless-stopped
+    networks:
+      - backend_network
+
+networks:
+  backend_network:
+    driver: bridge
+    ipam:
+      driver: default
+      config:
+        - subnet: "172.24.0.0/16"
+          gateway: "172.24.0.1"
+
+volumes:
+  n8n_data:
+    driver: local
--- a/backend/run.txt
+++ b/backend/run.txt
-
-.\.venv\Scripts\activate
-
-uvicorn server:app --host 0.0.0.0 --port 5000 --reload
-
-uvicorn server:app --host 0.0.0.0 --port 5000
-
-docker restart chatbot-backend
-
-docker restart chatbot-backend && docker logs -f chatbot-backend
-
-docker logs -f chatbot-backend
-
-docker restart canifa_backend
-
-sudo docker compose -f docker-compose.prod.yml up -d --build
-
-Get-NetTCPConnection -LocalPort 5000 | ForEach-Object { Stop-Process -Id $_.OwningProcess -Force }
-
-taskkill /F /IM python.exe
-
-netstat -ano | findstr :5000 | ForEach-Object { $_.Split()[-1] } | Sort-Object -Unique | ForEach-Object { taskkill /F /PID $_ }
\ No newline at end of file
+
+.\.venv\Scripts\activate
+
+uvicorn server:app --host 0.0.0.0 --port 5000 --reload
+
+uvicorn server:app --host 0.0.0.0 --port 5000
+
+docker restart chatbot-backend
+
+docker restart chatbot-backend && docker logs -f chatbot-backend
+
+docker logs -f chatbot-backend
+
+docker restart canifa_backend
+
+sudo docker compose -f docker-compose.prod.yml up -d --build
+
+Get-NetTCPConnection -LocalPort 5000 | ForEach-Object { Stop-Process -Id $_.OwningProcess -Force }
+
+taskkill /F /IM python.exe
+
+netstat -ano | findstr :5000 | ForEach-Object { $_.Split()[-1] } | Sort-Object -Unique | ForEach-Object { taskkill /F /PID $_ }
+    
\ No newline at end of file
--- a/backend/test_streaming_verify.ps1
+++ b/backend/test_streaming_verify.ps1
-# TEST STREAMING + BACKGROUND USER_INSIGHT
-
-Write-Host "`n==== STREAMING TEST ====`n" -ForegroundColor Cyan
-
-$query = "Ao khoac nam mua dong"
-$deviceId = "test_stream_verify"
-
-Write-Host "Sending request..." -ForegroundColor Green
-
-$timing = Measure-Command {
-    $body = '{"user_query":"' + $query + '","device_id":"' + $deviceId + '"}'
-    $result = $body | curl.exe -s -X POST "http://localhost:5000/api/agent/chat" -H "Content-Type: application/json" --data-binary "@-"
-    $result | Out-Null
-}
-
-Write-Host "`nResponse Time: $($timing.TotalMilliseconds) ms" -ForegroundColor Green
-Write-Host "`nCheck backend logs for:" -ForegroundColor Yellow
-Write-Host "  - Starting LLM streaming" -ForegroundColor Gray
-Write-Host "  - Regex matched product_ids" -ForegroundColor Gray
-Write-Host "  - BREAKING STREAM NOW" -ForegroundColor Gray
-Write-Host "  - Background task extraction" -ForegroundColor Gray
-Write-Host "`nDone!" -ForegroundColor Green
-
--- a/backend/tests/auto_test_chatbot.py
+++ b/backend/tests/auto_test_chatbot.py
--- a/backend/tests/profiler_bottleneck.py
+++ b/backend/tests/profiler_bottleneck.py
--- a/backend/tests/profiler_stress.py
+++ b/backend/tests/profiler_stress.py
--- a/backend/tests/setup_test_sheet.py
+++ b/backend/tests/setup_test_sheet.py
+"""
+Create Google Sheet using Sheets API v4 directly (not gspread).
+This avoids Drive quota issues by creating the spreadsheet via Sheets API.
+"""
+
+import json
+import sys
+from pathlib import Path
+
+from google.oauth2.service_account import Credentials
+from googleapiclient.discovery import build
+
+CREDENTIALS_FILE = Path(__file__).parent / "google_credentials.json"
+SCOPES = [
+    "https://www.googleapis.com/auth/spreadsheets",
+    "https://www.googleapis.com/auth/drive",
+]
+
+TEST_QUESTIONS = [
+    "Tìm cho mình chân váy màu đỏ",
+    "Tìm quần màu đỏ",
+    "Tìm áo polo nam",
+    "Tìm áo khoác nữ mùa đông",
+    "Mình muốn mua đồ đi biển, gợi ý cho mình",
+    "Cho mình xem áo sơ mi đi làm",
+    "Gợi ý outfit đi dự tiệc",
+    "Áo size S giá dưới 500k",
+    "Có khuyến mãi gì không?",
+    "Cách đặt hàng online",
+    "Cửa hàng nào gần nhất ở Hà Nội",
+    "Xin chào",
+    "Cảm ơn bạn",
+    "Tìm sản phẩm abc123 không tồn tại",
+]
+
+
+def main():
+    creds = Credentials.from_service_account_file(str(CREDENTIALS_FILE), scopes=SCOPES)
+
+    sheets_service = build("sheets", "v4", credentials=creds)
+    drive_service = build("drive", "v3", credentials=creds)
+
+    # Create spreadsheet via Sheets API
+    spreadsheet_body = {
+        "properties": {"title": "Canifa Chatbot Test Results"},
+        "sheets": [{
+            "properties": {"title": "Test Questions"},
+        }]
+    }
+
+    print("📝 Creating spreadsheet via Sheets API...")
+    result = sheets_service.spreadsheets().create(body=spreadsheet_body).execute()
+    sheet_id = result["spreadsheetId"]
+    sheet_url = result["spreadsheetUrl"]
+    print(f"✅ Created! ID: {sheet_id}")
+    print(f"📊 URL: {sheet_url}")
+
+    # Write headers + data
+    headers = ["STT", "Câu hỏi test", "Câu trả lời", "Thời gian (ms)", "Trạng thái"]
+    values = [headers]
+    for i, q in enumerate(TEST_QUESTIONS, 1):
+        values.append([i, q, "", "", "⏳ Đang chờ..."])
+
+    sheets_service.spreadsheets().values().update(
+        spreadsheetId=sheet_id,
+        range="Test Questions!A1:E15",
+        valueInputOption="RAW",
+        body={"values": values}
+    ).execute()
+    print(f"✅ Wrote {len(TEST_QUESTIONS)} questions")
+
+    # Share with anyone (link)
+    try:
+        drive_service.permissions().create(
+            fileId=sheet_id,
+            body={"type": "anyone", "role": "writer"},
+            fields="id"
+        ).execute()
+        print("✅ Shared as 'anyone with link can edit'")
+    except Exception as e:
+        print(f"⚠️ Could not share: {e}")
+
+    # Save sheet info
+    info = {"sheet_url": sheet_url, "sheet_id": sheet_id}
+    info_path = Path(__file__).parent / "sheet_info.json"
+    info_path.write_text(json.dumps(info, indent=2))
+    print(f"💾 Saved to {info_path}")
+
+
+if __name__ == "__main__":
+    main()