artsite/content/projects/nlp-hr-onboarding.md at b6f791e6339a6441cb7c9499c6749787bfb618e7

arthur/artsite

Fork 0

mirror of https://github.com/ArthurDanjou/artsite.git synced 2026-03-16 05:09:46 +01:00

Files

Arthur DANJOU b6f791e633 feat: ajouter le projet Intelligent HR Onboarding Assistant avec documentation complète

2026-03-11 13:32:04 +01:00

5.1 KiB

Raw Blame History

slug, title, type, description, shortDescription, publishedAt, readingTime, favorite, status, tags, icon

slug

title

type

description

shortDescription

publishedAt

readingTime

favorite

status

tags

icon

nlp-hr-onboarding

Intelligent HR Onboarding Assistant

Personal Project

Intelligent HR onboarding assistant using RAG, LangChain agents, and MistralAI embeddings to help new employees navigate company policies, employee directory, and administrative tasks.

An AI-powered assistant for streamlining HR onboarding processes and improving new hire experience.

2026-03-13

false

Completed

Python

NLP

LangChain

RAG

i-ph-robot-duotone

NLP Project — Master M2
Authors: Arthur DANJOU, Aksscel Meh-Rik, Moritz von SIEMENS

::BackgroundTitle{title="Project Overview"} ::

The Intelligent HR Onboarding Assistant is a conversational AI system designed to guide new employees during their first week at TechCorp. It combines retrieval-augmented generation, tool-using agents, and conversational memory to provide accurate and actionable HR support.

The assistant can answer policy questions, retrieve employee information, schedule internal meetings, and prepare leave requests from natural-language prompts.

::BackgroundTitle{title="Key Features"} ::

Semantic HR policy search powered by a RAG pipeline.
Employee directory lookup from structured JSON records.
Meeting scheduling tools integrated through LangChain.
Automated leave request workflow from chat instructions.
Sliding-window memory to keep multi-turn context coherent.
Interactive Gradio UI with visible agent actions and tool calls.

::BackgroundTitle{title="Architecture"} ::

┌──────────────────────────────────────────────────────────┐
│           HR Onboarding Assistant — TechCorp            │
│                                                         │
│  📝 System prompts (LangChain LCEL)                     │
│  🧠 Sliding window conversational memory                │
│  🔧 Tools:                                              │
│     ├── 🔍 Knowledge base search (RAG)                  │
│     ├── 👤 Employee directory                           │
│     ├── 📅 Meeting scheduling                           │
│     ├── 🏖️ Leave request submission                     │
│     └── 🕐 Current date and time                        │
│  🔄 ReAct loop: reason → act → observe                  │
│  📊 MistralAI Embeddings + Qdrant Vector Store          │
└──────────────────────────────────────────────────────────┘

::BackgroundTitle{title="Prerequisites"} ::

Python ≥ 3.13
MistralAI API key

::BackgroundTitle{title="Installation"} ::

Clone the repository

git clone <repository-url>
cd NLP-Intelligent-HR-Onboarding-Assistant-with-RAG-and-LangChain

Install dependencies
```
uv sync
```
Configure MistralAI API key

Set the environment variable:
```
export MISTRAL_API_KEY="your_api_key"
```

::BackgroundTitle{title="Usage"} ::

Run the Jupyter notebook

jupyter notebook projet.ipynb

Execute cells sequentially to:

Analyze tokenization of HR documents
Create the Qdrant vector database
Initialize the ReAct agent
Run demonstrations
Launch the Gradio interface (runs on http://127.0.0.1:7860)

Data structure

data/
├── entreprise.md    # HR knowledge base (leave policy, remote work, etc.)
└── employés.json    # TechCorp employee directory

::BackgroundTitle{title="Learning Modules"} ::

TP	Concept	Usage
TP1	Embeddings	Document vectorization and cosine similarity retrieval
TP2	BPE Tokenization	Token-cost analysis with FR/EN comparison
TP3	LLM + LangChain	ChatMistralAI setup, prompts, and LCEL chains
TP4	Agents + Memory	`@tool` usage, ReAct orchestration, sliding-window memory
TP5	RAG + Gradio	Qdrant indexing, semantic retrieval, interactive UI

::BackgroundTitle{title="Technologies"} ::

LangChain: LLM orchestration framework
MistralAI: LLM inference and embeddings (mistral-embed)
Qdrant: In-memory vector database
Gradio: Interactive web interface
tiktoken: BPE tokenization analysis
pandas: Employee data manipulation

::BackgroundTitle{title="Main Dependencies"} ::

langchain>=1.2.11
langchain-mistralai>=1.1.1
langchain-qdrant>=1.1.0
gradio>=6.9.0
tiktoken>=0.12.0
pandas>=3.0.1

::BackgroundTitle{title="Example Prompts"} ::

"How many days of annual leave do I have?"
"What is the remote work policy?"
"Give me Claire Petit's contact information"
"Schedule a meeting with the Data Science team tomorrow at 2pm"
"I want to request leave from January 15th to 20th"

::BackgroundTitle{title="Authors"} ::

Arthur DANJOU
Axelle MERIC
Moritz von SIEMENS

Project completed as part of the Natural Language Processing course — Master M2

5.1 KiB Raw Blame History

Run the Jupyter notebook

Data structure

5.1 KiB

Raw Blame History