Files
artsite/content/projects/nlp-hr-onboarding.md

5.1 KiB

slug, title, type, description, shortDescription, publishedAt, readingTime, favorite, status, tags, icon
slug title type description shortDescription publishedAt readingTime favorite status tags icon
nlp-hr-onboarding Intelligent HR Onboarding Assistant Personal Project Intelligent HR onboarding assistant using RAG, LangChain agents, and MistralAI embeddings to help new employees navigate company policies, employee directory, and administrative tasks. An AI-powered assistant for streamlining HR onboarding processes and improving new hire experience. 2026-03-13 3 false Completed
Python
NLP
LangChain
RAG
i-ph-robot-duotone

NLP Project — Master M2
Authors: Arthur DANJOU, Aksscel Meh-Rik, Moritz von SIEMENS

::BackgroundTitle{title="Project Overview"} ::

The Intelligent HR Onboarding Assistant is a conversational AI system designed to guide new employees during their first week at TechCorp. It combines retrieval-augmented generation, tool-using agents, and conversational memory to provide accurate and actionable HR support.

The assistant can answer policy questions, retrieve employee information, schedule internal meetings, and prepare leave requests from natural-language prompts.

::BackgroundTitle{title="Key Features"} ::

  • Semantic HR policy search powered by a RAG pipeline.
  • Employee directory lookup from structured JSON records.
  • Meeting scheduling tools integrated through LangChain.
  • Automated leave request workflow from chat instructions.
  • Sliding-window memory to keep multi-turn context coherent.
  • Interactive Gradio UI with visible agent actions and tool calls.

::BackgroundTitle{title="Architecture"} ::

┌──────────────────────────────────────────────────────────┐
│           HR Onboarding Assistant — TechCorp            │
│                                                         │
│  📝 System prompts (LangChain LCEL)                     │
│  🧠 Sliding window conversational memory                │
│  🔧 Tools:                                              │
│     ├── 🔍 Knowledge base search (RAG)                  │
│     ├── 👤 Employee directory                           │
│     ├── 📅 Meeting scheduling                           │
│     ├── 🏖️ Leave request submission                     │
│     └── 🕐 Current date and time                        │
│  🔄 ReAct loop: reason → act → observe                  │
│  📊 MistralAI Embeddings + Qdrant Vector Store          │
└──────────────────────────────────────────────────────────┘

::BackgroundTitle{title="Prerequisites"} ::

  • Python ≥ 3.13
  • MistralAI API key

::BackgroundTitle{title="Installation"} ::

  1. Clone the repository

    git clone <repository-url>
    cd NLP-Intelligent-HR-Onboarding-Assistant-with-RAG-and-LangChain
    
  2. Install dependencies

    uv sync
    
  3. Configure MistralAI API key

    Set the environment variable:

    export MISTRAL_API_KEY="your_api_key"
    

::BackgroundTitle{title="Usage"} ::

Run the Jupyter notebook

jupyter notebook projet.ipynb

Execute cells sequentially to:

  1. Analyze tokenization of HR documents
  2. Create the Qdrant vector database
  3. Initialize the ReAct agent
  4. Run demonstrations
  5. Launch the Gradio interface (runs on http://127.0.0.1:7860)

Data structure

data/
├── entreprise.md    # HR knowledge base (leave policy, remote work, etc.)
└── employés.json    # TechCorp employee directory

::BackgroundTitle{title="Learning Modules"} ::

TP Concept Usage
TP1 Embeddings Document vectorization and cosine similarity retrieval
TP2 BPE Tokenization Token-cost analysis with FR/EN comparison
TP3 LLM + LangChain ChatMistralAI setup, prompts, and LCEL chains
TP4 Agents + Memory @tool usage, ReAct orchestration, sliding-window memory
TP5 RAG + Gradio Qdrant indexing, semantic retrieval, interactive UI

::BackgroundTitle{title="Technologies"} ::

  • LangChain: LLM orchestration framework
  • MistralAI: LLM inference and embeddings (mistral-embed)
  • Qdrant: In-memory vector database
  • Gradio: Interactive web interface
  • tiktoken: BPE tokenization analysis
  • pandas: Employee data manipulation

::BackgroundTitle{title="Main Dependencies"} ::

langchain>=1.2.11
langchain-mistralai>=1.1.1
langchain-qdrant>=1.1.0
gradio>=6.9.0
tiktoken>=0.12.0
pandas>=3.0.1

::BackgroundTitle{title="Example Prompts"} ::

  • "How many days of annual leave do I have?"
  • "What is the remote work policy?"
  • "Give me Claire Petit's contact information"
  • "Schedule a meeting with the Data Science team tomorrow at 2pm"
  • "I want to request leave from January 15th to 20th"

::BackgroundTitle{title="Authors"} ::

  • Arthur DANJOU
  • Axelle MERIC
  • Moritz von SIEMENS

Project completed as part of the Natural Language Processing course — Master M2