update exercise description

This commit is contained in:
franzi
2021-11-01 10:19:18 +01:00
parent 7f019bb5a2
commit 6c0130a3a7
3 changed files with 7 additions and 3 deletions

View File

@@ -67,7 +67,11 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# extract all paragraphs with a list comprehension (have a look at the Python tutorial if this is new to you)\n", "# extract all paragraphs with a list comprehension\n",
"# for every article key \"a\" in the dictionary,\n",
"# you get the corresponding list of paragraphs with articles[a]\n",
"# then take all of these paragraphs from all articles to form a long list\n",
"# (have a look at the Python tutorial if this is new to you)\n",
"paragraphs_corpus = [p for a in articles for p in articles[a]]\n", "paragraphs_corpus = [p for a in articles for p in articles[a]]\n",
"print(f\"Our dataset contains {len(paragraphs_corpus)} paragraphs\")" "print(f\"Our dataset contains {len(paragraphs_corpus)} paragraphs\")"
] ]
@@ -235,7 +239,7 @@
"source": [ "source": [
"### Task 1: remove outliers and compute Kernel PCA again\n", "### Task 1: remove outliers and compute Kernel PCA again\n",
"\n", "\n",
"1. Remove the `BEGIN DOCUMENT` and `END DOCUMENT` \"paragraphs\" from the dataset, i.e., the first and last elements of the list of paragraphs for each article \n", "1. Remove the `BEGIN DOCUMENT` and `END DOCUMENT` \"paragraphs\" from the dataset, i.e., the first and last elements of the list of paragraphs for each article. You can accomplish this by indexing the list of paragraphs of an article with `[1:-1]` to take only the second until the second-to-last elements.\n",
"2. Transform this new list of paragraphs into TF-IDF vectors again\n", "2. Transform this new list of paragraphs into TF-IDF vectors again\n",
"3. Compute KernelPCA like before and plot the scatter plot (with colors) again\n", "3. Compute KernelPCA like before and plot the scatter plot (with colors) again\n",
"4. Look at the eigenvalue spectrum again - what do you observe?" "4. Look at the eigenvalue spectrum again - what do you observe?"
@@ -247,7 +251,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# remove outliers (i.e. first and last \"paragraph\" for each article)\n", "# remove outliers (i.e., first and last \"paragraph\" for each article)\n",
"paragraphs_corpus = ..." "paragraphs_corpus = ..."
] ]
}, },