Add second part of the project

2026-03-18 20:49:37 +01:00 · 2024-11-27 13:51:33 +01:00
parent 32e6c1733a
commit 188d1f7cad
2 changed files with 419 additions and 17 deletions
--- a/1/003_rapport_DANJOU_DUROUSSEAU.pdf
+++ b/1/003_rapport_DANJOU_DUROUSSEAU.pdf
--- a/1/003_rapport_DANJOU_DUROUSSEAU.rmd
+++ b/1/003_rapport_DANJOU_DUROUSSEAU.rmd
@@ -1,13 +1,17 @@
 ---
 title: "Groupe 03 Projet DANJOU - DUROUSSEAU"
-output: html_document
+output:
+pdf_document:
+toc: yes
+toc_depth: 3
+fig_caption: yes
 ---

 # Exercise 1 : Negative weighted mixture

-### Definition
+## Definition

-##### Question 1
+### Question 1

 The conditions for a function f to be a probability density are :

@@ -30,7 +34,7 @@ We can see that $f_1$ dominates the tail behavior.

 \

-##### Question 2
+### Question 2

 For given parameters $(\mu_1, \sigma_1^2)$ and $(\mu_2, \sigma_2^2)$, we have $\forall x \in \mathbb{R}$, $f(x) \ge 0$

@@ -54,9 +58,7 @@ To find $C$, we know that $1 = \int_{\mathbb{R}} f(x) \,dx = \int_{\mathbb{R}} C

 $\Leftrightarrow C = \frac{1}{1-a}$

-\
-
-##### Question 3
+### Question 3

 ```{r}
 f <- function(a, mu1, mu2, s1, s2, x) {
@@ -130,9 +132,9 @@ cat(sprintf("a* = %f, a = %f, a <= a* [%s]", as, a, a <= as))

 We have $\sigma_1^2 \ge \sigma_2^2$ and $0 < a \le a^*$, so the numerical values are compatible with the constraints defined above.

-### Inverse c.d.f Random Variable simulation
+## Inverse c.d.f Random Variable simulation

-##### Question 4
+### Question 4

 To prove that the cumulative density function F associated with f is available in closed form, we need to compute $F(x) = \int_{-\infty}^{x} f(t) \,dt = \frac{1}{1-a} (\int_{-\infty}^{x} f_1(t)\, dt - a \int_{-\infty}^{x} f_2(t)\, dt) = \frac{1}{1-a} (F_1(x) - aF_2(x))$ where $F_1$ and $F_2$ are the cumulative density functions of $f_1 \sim \mathcal{N}(\mu1, \sigma_1^2)$ and $f_2 \sim \mathcal{N}(\mu2, \sigma_2^2)$ respectively.

@@ -145,7 +147,7 @@ F <- function(a, mu1, mu2, s1, s2, x) {
 }
 ```

-To construct an algorithm that returns the value of the inverse function method as a function of u ∈(0,1), of the parameters $a, \mu_1, \mu_2, \sigma_1, \sigma_2, x$, and of an approximation precision $\epsilon$, we can use the bisection method.
+To construct an algorithm that returns the value of the inverse function method as a function of $u \in (0,1)$, of the parameters $a, \mu_1, \mu_2, \sigma_1, \sigma_2, x$, and of an approximation precision $\epsilon$, we can use the bisection method.

 We fixe $\epsilon > 0$.\
 We set $u \in (0, 1)$.\
@@ -185,7 +187,7 @@ For $i = 1, \dots, n$ :

 End for We return $X$

-##### Question 5
+### Question 5

 ```{r}
 inv_cdf <- function(n) {
@@ -229,18 +231,18 @@ legend("bottomright", legend = c("Theoretical cdf", "Empirical cdf"), col = c("r

 We can see of both graphs that the empirical cumulative distribution function is close to the theoretical one.

-### Accept-Reject Random Variable simulation
+## Accept-Reject Random Variable simulation

-##### Question 6
+### Question 6

 To simulate under f using the accept-reject algorithm, we need to find a density function g such that $f(x) \le M g(x)$ for all $x \in \mathbb{R}$, where M is a constant.\

 Then, we generate $X \sim g$ and $U \sim \mathcal{U}([0,1])$.\
-We accept $Y = X$ if $U \le \frac{f(X)}{Mg(X)}$. Return $1$ otherwise.
+We accept $Y = X$ if $U \le \frac{f(X)}{Mg(X)}$. Return to $1$ otherwise.

 The probability of acceptance is $\int_{\mathbb{R}} \frac{f(x)}{Mg(x)} g(x) \,dx = \frac{1}{M} \int_{\mathbb{R}} f(x) \,dx = \frac{1}{M}$

-Here we pose $g = f1$.
+Here we pose $g = f_1$.

 Then we have $\frac{f(x)}{g(x)} = \frac{1}{1-a} (1 - a\frac{f_2(x)}{f_1(x)})$

@@ -250,7 +252,7 @@ We compute in our fist equation : $\frac{1}{1-a} (1 - a\frac{f_2(x)}{f_1(x)}) \l

 To conclude, we have $M = \frac{1}{1-a}$ and the probability of acceptance is $\frac{1}{M} = 1-a$

-##### Question 7
+### Question 7

 ```{r}
 accept_reject <- function(n, a) {
@@ -295,7 +297,7 @@ n <- 10000
 cat(sprintf("[M = %.2f] Empirical acceptance rate: %f, Theoretical acceptance rate: %f \n", M, acceptance_rate(n), 1 / M))
 ```

-##### Question 8
+### Question 8

 ```{r}
 set.seed(123)
@@ -311,3 +313,403 @@ plot(a_values, acceptance_rates, type = "l", col = "blue", xlab = "a", ylab = "A
 points(as, acceptance_rate(n, as), col = "red", pch = 16)
 legend("topright", legend = c("Acceptance rate for all a", "Acceptance rate for a_star"), col = c("lightblue", "red"), lty = c(1, NA), pch = c(NA, 16))
 ```
+
+## Random Variable simulation with stratification
+
+### Question 9
+
+We consider a partition $\mathcal{P}= (D_0,D_1,...,D_k)$, $k \in \mathbb{N}$ of $\mathbb{R}$ such that $D_0$ covers the tails of $f_1$ and $f_1$ is upper bounded and $f_2$ lower bounded in $D_1, \dots, D_k$.
+
+To simulate under $f$ using the accept-reject algorithm, we need to find a density function $g$ such that $f(x) \le M g(x)$ for all $x \in \mathbb{R}$, where M is a constant.\
+
+We generate $X \sim g$ and $U \sim \mathcal{U}([0,1])$ $\textit{(1)}$.\
+We accept $Y = X$ if $U \le \frac{f(X)}{Mg(X)}$. Otherwise, we return to $\textit{(1)}$.
+
+Here we pose $g(x) = f_1(x)$ if $x \in D_0$ and $g(x) = sup_{D_i} f_1(x)$ if $x \in D_i, i \in \{1, \dots, k\}$.
+
+To conclude, we have $M = \frac{1}{1-a}$ and the probability of acceptance is $r = \frac{1}{M} = 1-a$
+
+### Question 10
+
+Let $P_n = (D_0, \dots D_n)$ a partition of $\mathbb{R}$ for $n \in \mathbb{N}$.
+We have $\forall x \in P_n$ and $\forall i \in \{0, \dots, n\}$, $\lim_{n \to\infty} sup_{D_i} f_1(x) = f_1(x)$ and $\lim_{x\to\infty} inf_{D_i} f_2(x) = f_2(x)$.
+
+$\Rightarrow \lim_{x\to\infty} g(x) = f(x)$
+
+$\Rightarrow \lim_{x\to\infty} \frac{g(x)}{f(x)} = 1$ as $f(x) > 0$ as $f$ is a density function.
+
+$\Rightarrow \forall \epsilon > 0, \exists n_{\epsilon} \in \mathbb{N}$ such that $\forall n \ge n_{\epsilon}$, $M = sup_{x \in P_n} \frac{g(x)}{f(x)} < \epsilon$
+
+$\Rightarrow r = \frac{1}{M} > \frac{1}{\epsilon} := \delta \in ]0, 1]$ where $r$ is the acceptance rate defined in the question 10.
+
+### Question 11
+
+We recall the parameters and the functions of the problem.
+```{r}
+mu1 <- 0
+mu2 <- 1
+s1 <- 3
+s2 <- 1
+a <- 0.2
+
+f1 <- function(x) {
+  dnorm(x, mu1, s1)
+}
+
+f2 <- function(x) {
+  dnorm(x, mu2, s2)
+}
+
+f <- function(x) {
+  fx <- f1(x) - a * f2(x)
+  fx[fx < 0] <- 0
+  return(fx / (1 - a))
+}
+
+f1_bounds <- c(mu1 - 3 * s1, mu1 + 3 * s1)
+```
+
+We implement the partition, the given $g$ function to understand the behavior of $g$ compared to $f$ and the computation of the supremum and infimum of $f_1$ and $f_2$ on each partition.
+```{r}
+create_partition <- function(k = 10) {
+  return(seq(f1_bounds[1], f1_bounds[2], , length.out = k))
+}
+
+sup_inf <- function(f, P, i) {
+  x <- seq(P[i], P[i + 1], length.out = 1000)
+  f_values <- sapply(x, f)
+  return(c(max(f_values), min(f_values)))
+}
+
+g <- function(X, P) {
+  values <- numeric(0)
+  for (x in X) {
+    if (x <= P[1] | x >= P[length(P)]) {
+      values <- c(values, 1 / (1 - a) * f1(x))
+    } else {
+      for (i in 1:(length(P) - 1)) {
+        if (x >= P[i] & x <= P[i + 1]) {
+          values <- c(values, 1 / (1 - a) * (sup_inf(f1, P, i)[1]) - a * sup_inf(f2, P, i)[2])
+        }
+      }
+    }
+  }
+  return(values)
+}
+```
+
+We plot the function $f$ and the dominating function $g$ for different sizes of the partition.
+```{r}
+library(ggplot2)
+
+X <- seq(-12, 12, length.out = 1000)
+
+# Plot for different k with ggplot on same graph
+k_values <- c(10, 20, 50, 100)
+P_values <- lapply(k_values, create_partition)
+g_values <- lapply(P_values, g, X)
+
+ggplot() +
+  geom_line(aes(x = X, y = f(X)), col = "red") +
+  geom_line(aes(x = X, y = g(X, P_values[[1]])), col = "green") +
+  geom_line(aes(x = X, y = g(X, P_values[[2]])), col = "orange") +
+  geom_line(aes(x = X, y = g(X, P_values[[3]])), col = "purple") +
+  geom_line(aes(x = X, y = g(X, P_values[[4]])), col = "black") +
+  labs(title = "Dominating function g of f for different size of partition", x = "x", y = "Density") +
+  theme_minimal()
+```
+
+Here, we implement the algorithm of accept-reject with the given partition and an appropriate function $g$ to compute $f$.
+```{r}
+set.seed(123)
+
+g_accept_reject <- function(x, P) {
+  if (x < P[1] | x >= P[length(P)]) {
+    return(f1(x))
+  } else {
+    for (i in seq_along(P)) {
+      if (x >= P[i] & x < P[i + 1]) {
+        return(sup_inf(f1, P, i)[1])
+      }
+    }
+  }
+}
+
+stratified <- function(n, P) {
+  samples <- numeric(0)
+  rate <- 0
+  while (length(samples) < n) {
+    x <- rnorm(1, mu1, s1)
+    u <- runif(1)
+    if (u <= f(x) * (1 - a) / g_accept_reject(x, P)) {
+      samples <- c(samples, x)
+    }
+    rate <- rate + 1
+  }
+  list(samples = samples, acceptance_rate = n / rate)
+}
+
+n <- 10000
+k <- 100
+
+P <- create_partition(k)
+samples <- stratified(n, P)
+X <- seq(-10, 10, length.out = 1000)
+
+hist(samples$samples, breaks = 50, freq = FALSE, col = "lightblue", main = "Empirical density function f", xlab = "x")
+lines(X, f(X), col = "red")
+```
+
+We also compute the acceptance rate of the algorithm.
+```{r}
+theorical_acceptance_rate <- 1 - a
+cat(sprintf("Empirical acceptance rate: %f, Theoretical acceptance rate: %.1f \n", samples$acceptance_rate, theorical_acceptance_rate))
+```
+
+### Question 12
+
+```{r}
+set.seed(123)
+
+stratified_delta <- function(n, delta) {
+  samples <- numeric(0)
+  P <- create_partition(n * delta)
+  rate <- 0
+  while (length(samples) < n | rate < delta * n) {
+    x <- rnorm(1, mu1, s1)
+    u <- runif(1)
+    if (u <= f(x) * delta / g_accept_reject(x, P)) {
+      samples <- c(samples, x)
+    }
+    rate <- rate + 1
+  }
+  list(samples = samples, partition = P, acceptance_rate = n / rate)
+}
+
+n <- 10000
+delta <- 0.8
+
+samples_delta <- stratified_delta(n, delta)
+X <- seq(-10, 10, length.out = 1000)
+
+hist(samples_delta$samples, breaks = 50, freq = FALSE, col = "lightblue", main = "Empirical density function f", xlab = "x")
+lines(X, f(X), col = "red")
+```
+
+We also compute the acceptance rate of the algorithm.
+```{r}
+theorical_acceptance_rate <- 1 - a
+cat(sprintf("Empirical acceptance rate: %f, Theoretical acceptance rate: %f \n", samples_delta$acceptance_rate, theorical_acceptance_rate))
+```
+
+Now, we will test the stratified_delta function for different delta:
+
+```{r}
+set.seed(123)
+n <- 1000
+deltas <- seq(0.1, 1, by = 0.1)
+
+for (delta in deltas) {
+  samples <- stratified_delta(n, delta)
+  cat(sprintf("Delta: %.1f, Empirical acceptance rate: %f \n", delta, samples$acceptance_rate))
+}
+```
+
+## Cumulative density function.
+
+### Question 13
+
+The cumulative density function $\forall x \in \mathbb{R}$ $F_X(x) = \int_{-\infty}^{x} f(t) \,dt = \int_{\mathbb{R}} f(t) h(t), dt$ where $h(x) = \mathbb{1}_{X_n \le x}$
+
+For a given $x \in \mathbb{R}$, a Monte Carlo estimator $F_n(x) = \frac{1}{n} \sum_{i=1}^{n} h(X_i)$ where $h$ is the same function as above and $(X_i)_{i=1}^{n} \sim^{iid} X$
+
+### Question 14
+
+As $X_1, \dots, X_n$ are iid and follows the law of X, and $h$ is continuous and positive, we have $h(X_1), \dots h(X_n)$ are iid and $\mathbb{E}[h(X_i)] < + \infty$. By the law of large numbers, we have $F_n(X) = \frac{1}{n} \sum_{i=1}^{n} h(X_i) \xrightarrow{a.s} \mathbb{E}[h(X_1)] = F_X(x)$.
+
+Moreover, $\forall \epsilon > 0$, $\exists N \in \mathbb{N}$ such that $\forall n \le N$, $|F_n(x) - F_X(x)| < \epsilon$, ie, $sup_{x \in \mathbb{R}} |F_n(x) - F_X(x)| \xrightarrow{a.s} 0$, by Glivenko-Cantelli theorem.
+
+Hence, $F_n$ is a good estimate of $F_X$ as a function of $x$.
+
+### Question 15
+```{r}
+set.seed(123)
+n <- 10000
+Xn <- (rnorm(n, mu1, s1) - a * rnorm(n, mu2, s2)) / (1 - a)
+X <- seq(-10, 10, length.out = n)
+
+h <- function(x, Xn) {
+  return(Xn <= x)
+}
+
+# Fn empirical
+empirical_cdf <- function(x, Xn) {
+  return(mean(h(x, Xn)))
+}
+
+# F theoretical
+F <- function(x) {
+  Fx <- pnorm(x, mu1, s1) - a * pnorm(x, mu2, s2)
+  return(Fx / (1 - a))
+}
+
+cat(sprintf("Empirical cdf: %f, Theoretical cdf: %f \n", empirical_cdf(X, Xn), mean(F(Xn))))
+```
+
+Now we plot the empirical and theoretical cumulative density functions for different n.
+```{r}
+n_values <- c(10, 100, 1000, 10000)
+colors <- c("lightblue", "blue", "darkblue", "navy")
+
+plot(NULL, xlim = c(-10, 10), ylim = c(0, 1), xlab = "u", ylab = "Qn(u)", main = "Empirical vs Theoretical CDF")
+
+for (i in seq_along(n_values)) {
+  n <- n_values[i]
+  X <- seq(-10, 10, length.out = n)
+  Xn <- (rnorm(n, mu1, s1) - a * rnorm(n, mu2, s2)) / (1 - a)
+  lines(X, sapply(X, empirical_cdf, Xn = Xn), col = colors[i], lt = 2)
+}
+
+lines(X, F(X), col = "red", lty = 1, lw = 2)
+legend("topright", legend = c("Empirical cdf (n=10)", "Empirical cdf (n=100)", "Empirical cdf (n=1000)", "Empirical cdf (n=10000)", "Theoretical cdf"), col = c(colors, "red"), lty = 1)
+```
+
+### Question 16
+
+As $X_1, \dots, X_n$ are iid and follows the law of X, and $h$ is continuous and positive, we have $h(X_1), \dots h(X_n)$ are iid and $\mathbb{E}[h(X_i)^2] < + \infty$. By the Central Limit Theorem, we have $\sqrt{n} \frac{(F_n(x) - F_X(x))}{\sigma} \xrightarrow{d} \mathcal{N}(0, 1)$ where $\sigma^2 = Var(h(X_1))$.
+
+So we have $\lim_{x\to\infty} \mathbb{P}(\sqrt{n} \frac{(F_n(x) - F_X(x))}{\sigma} \le q_{1-\frac{\alpha}{2}}^{\mathcal{N}(0, 1)})= 1 - \alpha$
+
+So by computing the quantile of the normal distribution, we can have a confidence interval for $F_X(x)$ :
+$F_X(x) \in [F_n(x) - \frac{q_{1-\frac{\alpha}{2}}^{\mathcal{N}(0, 1)} \sigma}{\sqrt{n}} ; F_n(x) + \frac{q_{1-\frac{\alpha}{2}}^{\mathcal{N}(0, 1)} \sigma}{\sqrt{n}}]$
+
+```{r}
+set.seed(123)
+Fn <- empirical_cdf(X, Xn)
+sigma <- sqrt(Fn - Fn^2)
+q <- qnorm(0.975)
+interval <- c(Fn - q * sigma / sqrt(n), Fn + q * sigma / sqrt(n))
+cat(sprintf("Confidence interval: [%f, %f] \n", interval[1], interval[2]))
+```
+
+### Question 17
+
+```{r}
+compute_n_cdf <- function(x, interval_length = 0.01) {
+  q <- qnorm(0.975)
+  ceiling((q^2 * F(x) * (1 - F(x))) / interval_length^2)
+}
+
+x_values <- c(-15, -1)
+n_values <- sapply(x_values, compute_n_cdf)
+
+data.frame(x = x_values, n = n_values)
+```
+
+We notice that the size of the sample needed to estimate the cumulative density function is higher for values of x that are close to the mean of the distribution. At $x = -1$, we are on the highest peek of the function and at $x = -15$ we are on the tail of the function.
+
+## Empirical quantile function
+
+### Question 18
+
+We define the empirical quantile function defined on $(0, 1)$ by : $Q_n(u) = inf\{x \in \mathbb{R} : u \le F_n(x)\}$.
+We recall the estimator $F_n(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{1}_{X_i \le x}$
+
+So we have $Q_n(u) = inf\{x \in \mathbb{R} : u \le \frac{1}{n} \sum_{i=1}^{n} \mathbb{1}_{X_i \le x}\} = inf\{x \in \mathbb{R} : n . u \le \sum_{i=1}^{n} \mathbb{1}_{X_i \le x}\}$
+
+We sort the sample $(X_1, \dots, X_n)$ in increasing order, and we define $X_{(1)} \le X_{(2)} \le \dots \le X_{(n)}$ the order statistics of the sample.
+
+As $\sum_{i=1}^{n} \mathbb{1}_{X_{(i)} \le x} = k$ where $X_{(k)} = max\{ i = 1, \dots, n ; X_{(i)} \le x \}$
+
+But as $F_n$ is a step function, we can simplify the expression to $Q_n(u) = X_{(k)}$ where $k = \lceil n \cdot u \rceil$ and $X_{(k)}$ is the k-th order statistic of the sample $(X_1, \dots, X_n)$.
+
+### Question 19
+
+We note $Y_{j,n} := \mathbb{1}_{X_{n,j} < Q(u) + \frac{t}{\sqrt{n}} \frac{\sqrt{u(1-u)}}{f(Q(u))}}$
+
+We know that $(X_{n,j})$ is iid as $X$ is bounded in j and n.
+
+Let $\Delta_{n} = \frac{t}{\sqrt{n}} \frac{\sqrt{u(1-u)}}{f(Q(u))}$.
+We have $F_{n}(X) = \frac{1}{n} \sum_{j=1}^{n} \mathbb{1}_{X_{n,j}} < Q(u) + \Delta_{n}$
+
+then $\frac{1}{n} \sum_{j=1}^{n}Y_{j,n} = F_{n}(Q(u) + \Delta_{n})$ by definition of the empirical quantile $F_{n}(Q_{n}(u)) = u$
+
+By Taylor formula, we got $F_{n}(Q(u) + \Delta_{n}) = F_{n}(Q(u)) + \Delta_{n}f(Q(u))$
+
+By Lindbergh-Levy Central Limit Theorem, applied to $F_{n}(Q(u))$ as $\mathbb{E}[F_{n}(Q(u))] = u < +\infty$ and $Var(F_{n}(Q(u))) = u(1-u) < +\infty$, we have $\frac{\sqrt{n}(F_{n}(Q(u)) - u)}{\sqrt{u(1-u)}} \rightarrow \mathcal{N}(0,1)$
+
+then $F_{n}(Q(u)) = u + \frac{1}{\sqrt{n}} Z$ with $Z \sim \mathcal{N}(0,u(1-u))$
+
+Thus $F_{n}(Q(u) + \Delta_{n}) = u + \frac{1}{\sqrt{n}} Z + \Delta_{n} f(Q(u))$
+
+By substituting $Q_{n}(u) = Q(u) + \Delta_{n}$, we have
+
+$$
+F_{n}(Q_{n}(u)) = F_{n}(Q(u) + \Delta_{n})\\
+\Leftrightarrow u =  u + \frac{1}{\sqrt{n}} Z + \Delta_{n} f(Q(u))\\
+\Leftrightarrow \Delta_{n} = - \frac{1}{\sqrt{n}} \frac{Z}{f(Q(u))}
+$$
+
+As $Q_{n}(u) = Q(u) + \Delta_{n} \Rightarrow \Delta_{n} = Q_{n}(u) - Q(u)$
+
+Then we have
+
+$$
+Q_{n}(u) - Q(u) = - \frac{1}{\sqrt{n}} \frac{Z}{f(Q(u))}\\
+\Leftrightarrow \sqrt{n}(Q_{n}(u) - Q(u)) = \frac{Z}{f(Q(u))}\\
+\Leftrightarrow \sqrt{n}(Q\_{n}(u) - Q(u)) \sim \mathcal{N}(0,\frac{u(1-u)}{f(Q(u))^2})
+$$
+
+### Question 20
+
+When $u \rightarrow 0$, $Q(u)$ corresponds to the lower tail of the distribution, and when $u \rightarrow 1$, $Q(u)$ corresponds to the upper tail of the distribution.
+
+So $f(Q(u))$ is higher when $u$ is close to 0 and close to 1, so the variance of the estimator is higher for values of $u$ that are close to 0 and 1. So we need a higher sample size to estimate the quantile function for values of $u$ that are close to 0 and 1.
+
+### Question 21
+```{r}
+set.seed(123)
+
+empirical_quantile <- function(u, Xn) {
+  sorted_Xn <- sort(Xn)
+  k <- ceiling(u * length(Xn))
+  sorted_Xn[k]
+}
+```
+
+```{r}
+set.seed(123)
+n <- 1000
+Xn <- (rnorm(n, mu1, s1) - a * rnorm(n, mu2, s2)) / (1 - a)
+u_values <- seq(0.01, 0.99, by = 0.01)
+Qn_values <- sapply(u_values, empirical_quantile, Xn = Xn)
+
+plot(u_values, Qn_values, col = "blue", xlab = "u", ylab = "Qn(u)", type = "o")
+lines(u_values, (qnorm(u_values, mu1, s1) - a * qnorm(u_values, mu2, s2)) / (1 - a), col = "red")
+legend("topright", legend = c("Empirical quantile function", "Theoretical quantile function"), col = c("blue", "red"), lty = c(2, 1))
+```
+
+### Question 22
+
+We can compute the confidence interval of the empirical quantile function using the Central Limit Theorem.
+
+We obtain the following formula for the confidence interval of the empirical quantile function:
+
+$Q(u) \in [Q_n(u) - q_{1 - \frac{\alpha}{2}}^{\mathcal{N}(0,1)} \frac{\sqrt{u (1-u)}}{\sqrt{n} f(Q(u))}; Q_n(u) + q_{1 - \frac{\alpha}{2}}^{\mathcal{N}(0,1)} \frac{\sqrt{u (1-u)}}{\sqrt{n} f(Q(u))}]$
+```{r}
+f_q <- function(u) {
+  f(1 / (1 - a) * (qnorm(u, mu1, s1) - a * qnorm(u, mu2, s2)))
+}
+
+compute_n_quantile <- function(u, interval_length = 0.01) {
+  q <- qnorm(0.975)
+  ceiling((q^2 * u * (1 - u)) / (interval_length^2 * f_q(u)^2))
+}
+
+u_values <- c(0.5, 0.9, 0.99, 0.999, 0.9999)
+n_values <- sapply(u_values, compute_n_quantile)
+
+data.frame(u = u_values, n = n_values)
+```
+
+We deduce that the size of the sample needed to estimate the quantile function is higher for values of u that are close to 1. This corresponds to the deduction made in question 20.