Files
artsite/content/projects/nlp-hr-onboarding.md

160 lines
5.1 KiB
Markdown

---
slug: nlp-hr-onboarding
title: Intelligent HR Onboarding Assistant
type: Academic Project
description: Intelligent HR onboarding assistant using RAG, LangChain agents, and MistralAI embeddings to help new employees navigate company policies, employee directory, and administrative tasks.
shortDescription: An AI-powered assistant for streamlining HR onboarding processes and improving new hire experience.
publishedAt: 2026-03-13
readingTime: 3
favorite: false
status: Completed
tags:
- Python
- NLP
- LangChain
- RAG
icon: i-ph-robot-duotone
---
**NLP Project — Master M2**
*Authors: Arthur DANJOU, Aksscel Meh-Rik, Moritz von SIEMENS*
::BackgroundTitle{title="Project Overview"}
::
The **Intelligent HR Onboarding Assistant** is a conversational AI system designed to guide new employees during their first week at **TechCorp**. It combines retrieval-augmented generation, tool-using agents, and conversational memory to provide accurate and actionable HR support.
The assistant can answer policy questions, retrieve employee information, schedule internal meetings, and prepare leave requests from natural-language prompts.
::BackgroundTitle{title="Key Features"}
::
- **Semantic HR policy search** powered by a RAG pipeline.
- **Employee directory lookup** from structured JSON records.
- **Meeting scheduling tools** integrated through LangChain.
- **Automated leave request workflow** from chat instructions.
- **Sliding-window memory** to keep multi-turn context coherent.
- **Interactive Gradio UI** with visible agent actions and tool calls.
::BackgroundTitle{title="Architecture"}
::
```
┌──────────────────────────────────────────────────────────┐
│ HR Onboarding Assistant — TechCorp │
│ │
│ 📝 System prompts (LangChain LCEL) │
│ 🧠 Sliding window conversational memory │
│ 🔧 Tools: │
│ ├── 🔍 Knowledge base search (RAG) │
│ ├── 👤 Employee directory │
│ ├── 📅 Meeting scheduling │
│ ├── 🏖️ Leave request submission │
│ └── 🕐 Current date and time │
│ 🔄 ReAct loop: reason → act → observe │
│ 📊 MistralAI Embeddings + Qdrant Vector Store │
└──────────────────────────────────────────────────────────┘
```
::BackgroundTitle{title="Prerequisites"}
::
- Python ≥ 3.13
- MistralAI API key
::BackgroundTitle{title="Installation"}
::
1. **Clone the repository**
```bash
git clone <repository-url>
cd NLP-Intelligent-HR-Onboarding-Assistant-with-RAG-and-LangChain
```
2. **Install dependencies**
```bash
uv sync
```
3. **Configure MistralAI API key**
Set the environment variable:
```bash
export MISTRAL_API_KEY="your_api_key"
```
::BackgroundTitle{title="Usage"}
::
### Run the Jupyter notebook
```bash
jupyter notebook projet.ipynb
```
Execute cells sequentially to:
1. Analyze tokenization of HR documents
2. Create the Qdrant vector database
3. Initialize the ReAct agent
4. Run demonstrations
5. Launch the Gradio interface (runs on `http://127.0.0.1:7860`)
### Data structure
```
data/
├── entreprise.md # HR knowledge base (leave policy, remote work, etc.)
└── employés.json # TechCorp employee directory
```
::BackgroundTitle{title="Learning Modules"}
::
| TP | Concept | Usage |
|:---|:--------|:------|
| **TP1** | Embeddings | Document vectorization and cosine similarity retrieval |
| **TP2** | BPE Tokenization | Token-cost analysis with FR/EN comparison |
| **TP3** | LLM + LangChain | ChatMistralAI setup, prompts, and LCEL chains |
| **TP4** | Agents + Memory | `@tool` usage, ReAct orchestration, sliding-window memory |
| **TP5** | RAG + Gradio | Qdrant indexing, semantic retrieval, interactive UI |
::BackgroundTitle{title="Technologies"}
::
- **LangChain**: LLM orchestration framework
- **MistralAI**: LLM inference and embeddings (`mistral-embed`)
- **Qdrant**: In-memory vector database
- **Gradio**: Interactive web interface
- **tiktoken**: BPE tokenization analysis
- **pandas**: Employee data manipulation
::BackgroundTitle{title="Main Dependencies"}
::
```
langchain>=1.2.11
langchain-mistralai>=1.1.1
langchain-qdrant>=1.1.0
gradio>=6.9.0
tiktoken>=0.12.0
pandas>=3.0.1
```
::BackgroundTitle{title="Example Prompts"}
::
- "How many days of annual leave do I have?"
- "What is the remote work policy?"
- "Give me Claire Petit's contact information"
- "Schedule a meeting with the Data Science team tomorrow at 2pm"
- "I want to request leave from January 15th to 20th"
::BackgroundTitle{title="Authors"}
::
- **Arthur DANJOU**
- **Axelle MERIC**
- **Moritz von SIEMENS**
*Project completed as part of the Natural Language Processing course — Master M2*