diff --git a/content/projects/nlp-hr-onboarding.md b/content/projects/nlp-hr-onboarding.md new file mode 100644 index 0000000..decf638 --- /dev/null +++ b/content/projects/nlp-hr-onboarding.md @@ -0,0 +1,159 @@ +--- +slug: nlp-hr-onboarding +title: Intelligent HR Onboarding Assistant +type: Personal Project +description: Intelligent HR onboarding assistant using RAG, LangChain agents, and MistralAI embeddings to help new employees navigate company policies, employee directory, and administrative tasks. +shortDescription: An AI-powered assistant for streamlining HR onboarding processes and improving new hire experience. +publishedAt: 2026-03-13 +readingTime: 3 +favorite: false +status: Completed +tags: + - Python + - NLP + - LangChain + - RAG +icon: i-ph-robot-duotone +--- + +**NLP Project — Master M2** +*Authors: Arthur DANJOU, Aksscel Meh-Rik, Moritz von SIEMENS* + +::BackgroundTitle{title="Project Overview"} +:: + +The **Intelligent HR Onboarding Assistant** is a conversational AI system designed to guide new employees during their first week at **TechCorp**. It combines retrieval-augmented generation, tool-using agents, and conversational memory to provide accurate and actionable HR support. + +The assistant can answer policy questions, retrieve employee information, schedule internal meetings, and prepare leave requests from natural-language prompts. + +::BackgroundTitle{title="Key Features"} +:: + +- **Semantic HR policy search** powered by a RAG pipeline. +- **Employee directory lookup** from structured JSON records. +- **Meeting scheduling tools** integrated through LangChain. +- **Automated leave request workflow** from chat instructions. +- **Sliding-window memory** to keep multi-turn context coherent. +- **Interactive Gradio UI** with visible agent actions and tool calls. + +::BackgroundTitle{title="Architecture"} +:: + +``` +┌──────────────────────────────────────────────────────────┐ +│ HR Onboarding Assistant — TechCorp │ +│ │ +│ 📝 System prompts (LangChain LCEL) │ +│ 🧠 Sliding window conversational memory │ +│ 🔧 Tools: │ +│ ├── 🔍 Knowledge base search (RAG) │ +│ ├── 👤 Employee directory │ +│ ├── 📅 Meeting scheduling │ +│ ├── 🏖️ Leave request submission │ +│ └── 🕐 Current date and time │ +│ 🔄 ReAct loop: reason → act → observe │ +│ 📊 MistralAI Embeddings + Qdrant Vector Store │ +└──────────────────────────────────────────────────────────┘ +``` + +::BackgroundTitle{title="Prerequisites"} +:: + +- Python ≥ 3.13 +- MistralAI API key + +::BackgroundTitle{title="Installation"} +:: + +1. **Clone the repository** + ```bash + git clone + cd NLP-Intelligent-HR-Onboarding-Assistant-with-RAG-and-LangChain + ``` + +2. **Install dependencies** + ```bash + uv sync + ``` + +3. **Configure MistralAI API key** + + Set the environment variable: + ```bash + export MISTRAL_API_KEY="your_api_key" + ``` + +::BackgroundTitle{title="Usage"} +:: + +### Run the Jupyter notebook + +```bash +jupyter notebook projet.ipynb +``` + +Execute cells sequentially to: +1. Analyze tokenization of HR documents +2. Create the Qdrant vector database +3. Initialize the ReAct agent +4. Run demonstrations +5. Launch the Gradio interface (runs on `http://127.0.0.1:7860`) + +### Data structure + +``` +data/ +├── entreprise.md # HR knowledge base (leave policy, remote work, etc.) +└── employés.json # TechCorp employee directory +``` + +::BackgroundTitle{title="Learning Modules"} +:: + +| TP | Concept | Usage | +|:---|:--------|:------| +| **TP1** | Embeddings | Document vectorization and cosine similarity retrieval | +| **TP2** | BPE Tokenization | Token-cost analysis with FR/EN comparison | +| **TP3** | LLM + LangChain | ChatMistralAI setup, prompts, and LCEL chains | +| **TP4** | Agents + Memory | `@tool` usage, ReAct orchestration, sliding-window memory | +| **TP5** | RAG + Gradio | Qdrant indexing, semantic retrieval, interactive UI | + +::BackgroundTitle{title="Technologies"} +:: + +- **LangChain**: LLM orchestration framework +- **MistralAI**: LLM inference and embeddings (`mistral-embed`) +- **Qdrant**: In-memory vector database +- **Gradio**: Interactive web interface +- **tiktoken**: BPE tokenization analysis +- **pandas**: Employee data manipulation + +::BackgroundTitle{title="Main Dependencies"} +:: + +``` +langchain>=1.2.11 +langchain-mistralai>=1.1.1 +langchain-qdrant>=1.1.0 +gradio>=6.9.0 +tiktoken>=0.12.0 +pandas>=3.0.1 +``` + +::BackgroundTitle{title="Example Prompts"} +:: + +- "How many days of annual leave do I have?" +- "What is the remote work policy?" +- "Give me Claire Petit's contact information" +- "Schedule a meeting with the Data Science team tomorrow at 2pm" +- "I want to request leave from January 15th to 20th" + +::BackgroundTitle{title="Authors"} +:: + +- **Arthur DANJOU** +- **Axelle MERIC** +- **Moritz von SIEMENS** + +*Project completed as part of the Natural Language Processing course — Master M2*