{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\"Open\n", "\n", "| - | - | - |\n", "|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------|\n", "| [Exercise 1 (integers in brackets)](<#Exercise-1-(integers-in-brackets)>) | [Exercise 2 (file listing)](<#Exercise-2-(file-listing)>) | [Exercise 3 (red green blue)](<#Exercise-3-(red-green-blue)>) |\n", "| [Exercise 4 (word frequencies)](<#Exercise-4-(word-frequencies)>) | [Exercise 5 (summary)](<#Exercise-5-(summary)>) | [Exercise 6 (file count)](<#Exercise-6-(file-count)>) |\n", "| [Exercise 7 (file extensions)](<#Exercise-7-(file-extensions)>) | [Exercise 8 (prepend)](<#Exercise-8-(prepend)>) | [Exercise 9 (rational)](<#Exercise-9-(rational)>) |\n", "| [Exercise 10 (extract numbers)](<#Exercise-10-(extract-numbers)>) | | |\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Python (continues)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regular expressions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examples\n", "\n", "We have already seen that we can ask from a string `str`\n", "whether it begins with some substring as follows:\n", "`str.startswith('Apple')`.\n", "If we would like to know whether it starts with `\"Apple\"` or\n", "`\"apple\"`, we would have to call `startswith` method twice.\n", "Regular expressions offer a simpler solution:\n", "`re.match(r\"[Aa]pple\", str)`.\n", "The bracket notation is one example of the special syntax of\n", "*regular expressions*. In this case it says that any of the\n", "characters inside brackets will do: either `\"A\"` or `\"a\"`. The other\n", "letters in `\"pple\"` will act normally. The string `r\"[Aa]pple\"` is\n", "called a *pattern*.\n", "\n", "A more complicated example asks whether the string `str`\n", "starts with either `apple` or `banana` (no matter if the first letter\n", "is capital or not):\n", "`re.match(r\"[Aa]pple|[Bb]anana\", str)`.\n", "In this example we saw a new special character `|` that denotes\n", "an alternative. On either side of the bar character we have a\n", "*subpattern*.\n", "\n", "A legal variable name in Python starts with a letter or an\n", "underline character and the following characters can also be\n", "digits.\n", "So legal names are, for instance: `_hidden`, `L_value`, `A123_`.\n", "But the name `2abc` is not a valid variable name.\n", "Let’s see what would be the regular expression pattern to\n", "recognise valid variable names:\n", "`r\"[A-Za-z_][A-Za-z_0-9]*\\Z\"`.\n", "Here we have used a shorthand for character ranges: `A-Z`.\n", "This means all the characters from `A` to `Z`.\n", "\n", "The first character of the variable name is defined in the first\n", "brackets. The subsequent characters are defined in the second\n", "brackets.\n", "The special character `*` means that we allow any number\n", "(0,1,2, . . . ) of the previous subpattern. For example the\n", "pattern `r\"ba*\"` allows strings `\"b\"`, `\"ba\"`, `\"baa\"`, `\"baaa\"`, and\n", "so on.\n", "The special syntax `\\Z` denotes the end of the string.\n", "Without it we would also accept `abc-` as a valid name since\n", "the `match` function normally checks only that a string starts with a pattern.\n", "\n", "The special notations, like `\\Z`, also cause problems with string\n", "handling.\n", "Remember that normally in string literals we have some\n", "special notation: `\\n` stands for newline, `\\t` stands for tab, and\n", "so on.\n", "So, both string literals and regular expressions use similar\n", "looking notations, which can create serious confusion.\n", "This can be solved by using the so-called *raw strings*. We\n", "denote a raw string by having an `r` letter before the first\n", "quotation mark, for example `r\"ab*\\Z\"`.\n", "When using raw strings, the newline (`\\n`), tab (`\\t`), and other\n", "special string literal notations aren’t interpreted. One should\n", "always use raw strings when defining regular expression\n", "patterns!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Patterns\n", "\n", "A pattern represents a set of strings. This set can even be\n", "potentially infinite.\n", "They can be used to describe a set of strings that have some\n", "commonality; some regular structure.\n", "Regular expressions (RE) are a classical computer science topic.\n", "They are very common in programming tasks. Scripting\n", "languages, like Python, are very fluent in regular expressions.\n", "Very complex text processing can be achieved using regular\n", "expressions.\n", "\n", "In patterns, normal characters (letters, numbers) just represent\n", "themselves, unless preceded by a backslash, which may trigger\n", "some special meaning.\n", "Punctuation characters have special meaning, unless preceded\n", "by backslash (`\\`), which deprives their special meaning.\n", "Use `\\\\` to represent a backslash character without any special\n", "meaning.\n", "In the following slides we will go through some of the more\n", "common RE notations.\n", "\n", "```\n", ". Matches any character\n", "[...] Matches any character contained within the brackets\n", "[^...] Matches any character not appearing after the hat (ˆ)\n", "ˆ Matches the start of the string\n", "$ Matches the end of the string\n", "* Matches zero or more previous RE\n", "+ Matches one or more previous RE\n", "{m,n} Matches m to n occurences of previous RE\n", "? Matches zero or one occurences of previous RE\n", "```\n", "\n", "We have already seen that a `|` character denotes alternatives.\n", "For example, the pattern `r\"Get (on|off|ready)\"` matches\n", "the following strings: `\"Get on\"`, `\"Get off\"`, `\"Get ready\"`.\n", "We can use parentheses to create groupings inside a pattern:\n", "`r\"(ab)+\"` will match the strings `\"ab\"`, `\"abab\"`, `\"ababab\"`,\n", "and so on.\n", "These groups are also given a reference number starting from 1. \n", "We can refer to groups using backreferences: `\\number`.\n", "For example, we can find separated patterns that get\n", "repeated: `r\"([a-z]{3,}) \\1 \\1\"`.\n", "This will recognise, for example, the following strings: `\"aca\n", "aca aca\"`, `\"turn turn turn\"`. But not the strings `\"aca\n", "aba aca\"` or `\"ac ac ac\"`.\n", "\n", "\n", "In the following, note that a hat (ˆ) as the first character\n", "inside brackets will create a complement set of characters:\n", "\n", "```\n", "`\\d` same as `[0-9]`, matches a digit\n", "`\\D` same as `[ˆ0-9]`, matches anything but a digit\n", "`\\s` matches a whitespace character (space, newline, tab, ... )\n", "`\\S` matches a nonwhitespace character\n", "`\\w` same as `[a-zA-Z0-9_]`, matches one alphanumeric character\n", "`\\W` matches one non-alphanumeric character\n", "```\n", "\n", "Using the above notation we can now shorten our previous\n", "variable name example to `r’[a-zA-Z_]\\w*\\Z’`\n", "\n", "The patterns `\\A`, `\\b`, `\\B`, and `\\Z` will all match an empty\n", "string, but in specific places.\n", "The patterns `\\A` and `\\Z` will recognise the beginning and end\n", "of the string, respectively.\n", "Note that the patterns `ˆ` and `$` can in some cases match also\n", "after a newline and before a newline, correspondingly.\n", "So, `\\A` is distinct from `ˆ`, and `\\Z` is distinct from `$`.\n", "The pattern `\\b` matches at the start or end of a word. The\n", "pattern `\\B` does the reverse." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Match and search functions\n", "\n", "We have so far only used the `re.match` function which tries\n", "to find a match at the beginning of a string\n", "The function `re.search` allows to match any substring of a\n", "string.\n", "Example: `re.search(r'\\bback\\b', s)` will match\n", "strings `\"back\"`, `\"a back, is a body part\"`, `\"get back\"`. But it\n", "will not match the strings `\"backspace\"` or `\"comeback\"`.\n", "\n", "The function `re.search` finds only the first occurence.\n", "We can use the `re.findall` function to find all occurences.\n", "Let’s say we want to find all present participle words in a\n", "string `s`. The present participle words have ending `'ing'`.\n", "The function call would look like this:\n", "`re.findall(r'\\w+ing\\b', s)`.\n", "Let’s try running this:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Doing', 'going', 'staying', 'sleeping']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "s = \"Doing things, going home, staying awake, sleeping later\"\n", "re.findall(r'\\w+ing\\b', s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s say we want to pick up all the integers from a string.\n", "We can try that with the following function call:\n", "`re.findall(r'[+-]?\\d+', s)`.\n", "An example run:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['23', '-24', '-1']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "re.findall(r'[+-]?\\d+', \"23 + -24 = -1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we are given a string of if/then sentences, and we\n", "would like to extract the conditions from these sentences.\n", "Let’s try the following function call:\n", "`re.findall(r'[Ii]f (.*), then', s)`.\n", "An example run:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['I’m not in a hurry, then I should stay. On the other hand, if I leave']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = (\"If I’m not in a hurry, then I should stay. \" +\n", " \"On the other hand, if I leave, then I can sleep.\")\n", "re.findall(r'[Ii]f (.*), then', s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But I wanted a result: `[\"I'm not in a hurry\", 'I leave']`. That\n", "is, the condition from both sentences. How can this be fixed?\n", "\n", "The problem is that the pattern `.*` tries to match as many\n", "characters as possible.\n", "This is called *greedy matching*.\n", "One way of solving this problem is to notice that the two\n", "sentences are separated by a full-stop (.).\n", "So, instead of matching all the characters, we need to match\n", "everything but the dot character.\n", "This can be achieved by using the complement character\n", "class: `[^.]`. The hat character (`ˆ`) in the beginning of a\n", "character class means the complement character class\n", "\n", "After the modification the function call looks like this:\n", "`re.findall(r'[Ii]f ([^.]*), then', s)`.\n", "Another way of solving this problem is to use a non-greedy\n", "matching.\n", "The repetition specifiers `+`, `*`, `?`, and `{m,n}` have\n", "corresponding non-greedy versions: `+?`, `*?`, `??`, and `{m,n}?`.\n", "These expressions use as few characters as possible to make\n", "the whole pattern match some substring.\n", "By using non-greedy version, the function call looks like this:\n", "`re.findall(r’[Ii]f (.*?), then’, s)`.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions in the `re` module\n", "\n", "Below is a list of the most common functions in the `re` module\n", "\n", "* `re.match(pattern, str)`\n", "* `re.search(pattern, str)`\n", "* `re.findall(pattern, str)`\n", "* `re.finditer(pattern, str)`\n", "* `re.sub(pattern, replacement, str, count=0)`\n", "\n", "Functions `match` and `search` return a *match object*.\n", "A match object describes the found occurence.\n", "The function `findall` returns a list of all the occurences of\n", "the pattern. The elements in the list are strings.\n", "The function `finditer` works like `findall` function except\n", "that instead of returning a list, it returns an iterator whose\n", "items are match objects.\n", "The function `sub` replaces all the occurences of the pattern in\n", "`str` with the string replacement and returns the new string.\n", "\n", "An example: The following program will replace all \"she\"\n", "words with \"he\"\n", "\n", "```\n", "import re\n", "str = \"She goes where she wants to, she's a sheriff.\"\n", "newstr = re.sub(r'\\b[Ss]he\\b', 'he', str)\n", "print newstr\n", "```\n", "\n", "This will print `he goes where he wants to, he's a sheriff.`\n", "\n", "The `sub` function can also use backreferences to refer to the\n", "matched string. The backreferences \\1, \\2, and so on, refer\n", "to the groups of the pattern, in order.\n", "An example:\n", "```\n", "import re\n", "str = \"\"\"He is the president of Russia.\n", "He’s a powerful man.\"\"\"\n", "newstr = re.sub(r'(\\b[Hh]e\\b)', r'\\1 (Putin)', str, 1)\n", "print newstr\n", "```\n", "\n", "This will print\n", "```\n", "He (Putin) is the president of Russia.\n", "He’s a powerful man.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Match object\n", "\n", "Functions `match`, `search`, and `finditer` use `match` objects\n", "to describe the found occurence.\n", "The method `groups()` of the match object returns the tuple\n", "of all the substrings matched by the groups of the pattern.\n", "Each pair of parentheses in the pattern creates a new group.\n", "These groups are are referred to by indices 1, 2, ...\n", "The group 0 is a special one: it refers to the match created by\n", "the whole pattern.\n", "\n", "Let’s look at the match object returned by the call\n", "\n", "```\n", "mo = re.search(r'\\d+ (\\d+) \\d+ (\\d+)',\n", "'first 123 45 67 890 last')\n", "```\n", "\n", "The call `mo.groups()` returns a tuple `(’45’, ’890’)`.\n", "We can access just some individual groups by using the\n", "method `group(gid, ...)`.\n", "For example, the call `mo.group(1)` will return `’45’`.\n", "The zeroth group will represent the whole match:\n", "`’123 45 67 890’`\n", "\n", "In addition to accessing the strings matched by the pattern\n", "and its groups, the corresponding indices of the original string\n", "can be accessed:\n", "\n", "* The `start(gid=0)` and `end(gid=0)` methods return the start\n", "and end indices of the matched group gid, correspondingly\n", "* The method `span(gid)` just returns the pair of these start\n", "and end indices\n", "\n", "The match object mo can also be used like a boolean value:\n", "\n", "```python\n", "mo = re.search(...)\n", "if mo:\n", " # do something\n", "```\n", "\n", "will do something if a match was found.\n", "Alternatively, the match object can be converted to a boolean\n", "value by the call `found = bool(mo)`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Miscellaneous stuff\n", "\n", "If the same pattern is used in many function calls, it may be\n", "wise to precompile the pattern, mainly for efficiency reasons.\n", "This can be done using the `compile(pattern, flags=0)` function\n", "in the `re` module. The function returns a so-called RE object.\n", "The RE object has method versions of the functions found in\n", "module `re`.\n", "The only difference is that the first parameter is not the\n", "pattern since the precompiled pattern is stored in the RE\n", "object.\n", "\n", "The details of matching operation can be specified using\n", "optional flags.\n", "These flags can be given either inside the pattern or as a\n", "parameter to the compile function.\n", "Some of the more common flags are given in the following\n", "table\n", "\n", "| x | Flag |\n", "|-----|--------------|\n", "|`(?i)` | re.IGNORECASE|\n", "|`(?m)` | re.MULTILINE|\n", "|`(?s)` | re.DOTALL|\n", "\n", "The elements on the left can appear anywhere in the pattern\n", "but preferably in the beginning.\n", "On the right there are attributes of the re module that can be\n", "given to the compile function as the second parameter\n", "\n", "The `IGNORECASE` flag makes lower- and uppercase\n", "characters appear as equal.\n", "The `MULTILINE` flag makes the special characters `ˆ` and `$`\n", "match the beginning and end of each line in addition to the\n", "beginning and end of the whole string. These flags make `\\A`\n", "differ from `ˆ`, and `\\Z` differ from `$`.\n", "The `DOTALL` flag makes the character class `.` (dot) also\n", "accept the newline character, in addition to all the other\n", "letters.\n", "\n", "When giving multiple flags to the compile function, the flags\n", "can be separated with the `|` sign.\n", "For example, `re.compile(pattern, re.MULTILINE | re.DOTALL)`.\n", "This is equal to `re.compile('(?m)(?s)' + pattern)`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 1 (integers in brackets)
\n", "\n", "Write function `integers_in_brackets` that finds from a given string all integers that are enclosed in brackets.\n", "\n", "Example run:\n", "`integers_in_brackets(\" afd [asd] [12 ] [a34] [ -43 ]tt [+12]xxx\")`\n", "returns\n", "`[12, -43, 12]`.\n", "So there can be whitespace between the number and the brackets, but no other character besides those that make up the integer.\n", "\n", "Test your function from the `main` function.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic file processing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A file can be opened with the `open` function. The call `open(filename, mode=\"r\")` will return a *file object*, whose type is `file`. This file object can be used to refer to a file on disk. For example, when we want to read from or write to a file, we can used the methods `read` and `write` of the file object. After the file object is no longer needed, a call to the `close` method should be made.\n", "\n", "We can control what kind of operations we can perform on a file with the *mode* parameter of the `open` function. Different options include opening a file for reading or writing,\n", "whether the file should exists already or be created with the\n", "call to open, etc. Here's a list of all the opening modes:\n", "\n", "| Mode | Description |\n", "| ---- | ----------- |\n", "| `r` | read-only mode, file must exist |\n", "| `w` | write-only mode, creates, or overwrites an existing file |\n", "| `a` | write-only mode, write always appends to the end |\n", "| `r+` | read/write mode, file must already exist |\n", "| `w+` | read/write mode, creates, or overwrites an existing file |\n", "| `a+` | read/write mode, write will append to end |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the end of the mode string either the letter `t` or `b` can be appended. These stand for text mode and binary mode. If this letter is not given, the file type is text mode by default. \n", "\n", "For binary mode the contents of the file are not interpreted in any way, and the read and write methods handle bytes. (A byte consists of 8 bits and can be used to represent a number in the range 0 to 255.)\n", "\n", "In the text mode two interpretations happen\n", "\n", "* On Windows operating system the end of line in files is encoded by two characters. When the file is read these two charactes are converted to `'\\n'` character. During writes to a file this conversion happens in the opposite direction.\n", "* One character is encoded in the file as one or more bytes. This conversion happens automatically during read and write operations. One common encoding between bytes and characters is utf-8. In this encoding, the Finnish character `'ä'`, for example, is encoded as the following sequence of bytes:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'\\xc3\\xa4'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"ä\".encode(\"utf-8\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above the two bytes were expressed as hexadecimals. In decimal notation they would be 195 and 164. (Both in the range from 0 to 255.)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[195, 164]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(\"ä\".encode(\"utf-8\")) # Show as a list of integers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the utf-8 encoding of the letter `'a'`?\n", "\n", "During this course we will only consider files containing text, so the default text mode is fine for us. But we might sometimes have to specify the encoding of a file, if it is not the usual utf-8." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Some common file object methods\n", "* `read(size)` will read size characters/bytes as a string\n", "* `write(string)` will write string/bytes to a file\n", "* `readline()` will read a string until and including the next newline character is met\n", "* `readlines()` will return a list of all lines of a file\n", "* `writelines()` will write a list of lines to a file\n", "* `flush()` will try to make sure that the changes made to a file are written to disk immediately" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-06-13T14:52:31.003612Z", "start_time": "2019-06-13T14:52:30.995901Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Line 0: {\n", "Line 1: \"cells\": [\n", "Line 2: {\n", "Line 3: \"cell_type\": \"markdown\",\n", "Line 4: \"metadata\": {},\n" ] } ], "source": [ "f = open(\"basics.ipynb\", \"r\") # Let's open this notebook file, \n", " # which is essentially a text file.\n", " # So you can open it in a texteditor as well.\n", " \n", "for i in range(5): # And read the first five lines\n", " line = f.readline()\n", " print(f\"Line {i}: {line}\", end=\"\")\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is easy to forget to close the file. One can use a *context manager* to solve this problem. A context manager is created with the `with` statement. After the indented block of the `with` statement exits, the file will be automatically closed." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-06-13T14:52:54.616535Z", "start_time": "2019-06-13T14:52:54.609996Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Line 0: {\n", "Line 1: \"cells\": [\n", "Line 2: {\n", "Line 3: \"cell_type\": \"markdown\",\n", "Line 4: \"metadata\": {},\n" ] } ], "source": [ "with open(\"basics.ipynb\", \"r\") as f: # the file will be automatically closed,\n", " # when the with block exits\n", " for i in range(5):\n", " line = f.readline()\n", " print(f\"Line {i}: {line}\", end=\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `file` object is iterable. This means that we can iterate through the lines in the file using a for loop, like in the below example:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-06-13T14:57:40.297056Z", "start_time": "2019-06-13T14:57:40.289012Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The longest line in this file has length 1046\n" ] } ], "source": [ "max_len = 0\n", "with open(\"basics.ipynb\", \"r\") as f:\n", " for line in f: # iterates through all the lines in the file\n", " if len(line) > max_len:\n", " max_len = len(line)\n", "print(f\"The longest line in this file has length {max_len}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standard file objects\n", "Python has automatically three file objects open:\n", "\n", "* `sys.stdin` for *standard input*\n", "* `sys.stdout` for *standard output*\n", "* `sys.stderr` for *standard error*\n", "To read a line from a user (keyboard), you can call `sys.stdin.readline()`. To write a line to a user (screen), call `sys.stdout.write(line)`. The standard error is meant for error messages only, even though its output often goes to the same destination as standard output.\n", "\n", "The print function uses the file `sys.stdout` and input function uses the file `sys.stdin`. An example of usage:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Got a negative integer.\n" ] } ], "source": [ "import sys\n", "import random\n", "i=random.randint(-10,10)\n", "if i >= 0:\n", " sys.stdout.write(\"Got a positive integer.\\n\")\n", "else:\n", " sys.stderr.write(\"Got a negative integer.\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These standard file objects are meant to be a basic input/output mechanism in textual form. The destinations of the file objects can be changed to point\n", "somewhere else than the usual keyboard and screen. Very often these are redirected to some files. For example, it is usual to point the stderr to a file where all\n", "error messages are logged." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## sys module\n", "\n", "We saw above that the `sys` module contains the three file objects `sys.stdin`, `sys.stdout`, and `sys.stderr`. It has also few other useful attributes. The attribute `sys.path` is the list of folders that Python uses to look for imported modules. The list `sys.argv` contains the so called *command line parameters*. For example in Linux if you are using the terminal, then you can run your program with the command `python3 programname.py param1 param2 ...`. After Python has started your program, the command line parameters are visible as follows. The name of the program is in `sys.argv[0]`. The rest of the command line parameters are after the program name in this list: `sys.argv[1]==\"param1\"`, `sys.argv[2]==\"param2\"`, and so on. The command line parameters can be useful in adjusting the behaviour of your program. A few examples of these will be in the following exercises. (The terminal window is a textual interface to your computer instead of the usual graphical interface.)\n", "\n", "The function `sys.exit` can be used to exit immediately your program. The integer parameter given to this function is the return value of the program. Usually the return value 0 means that the program ran successfully, and non-zero integer means that an error occurred. This return value is accessible from the terminal window from where you started the program." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 2 (file listing)
\n", "\n", "The file `src/listing.txt` contains a list of files with one line per file. Each line contains seven fields: access rights, number of references, owner's name, name of owning group, file size, date, filename. These fields are separated with one or more spaces. Note that there may be spaces also within these seven fields.\n", "\n", "Write function `file_listing` that loads the file `src/listing.txt`. It should return a list of tuples (size, month, day, hour, minute, filename). Use regular expressions to do this (either `match`, `search`, `findall`, or `finditer` method).\n", "\n", "An example: for line\n", "```\n", "-rw-r--r-- 1 jttoivon hyad-all 25399 Nov 2 21:25 exception_hierarchy.pdf\n", "```\n", "the function should create the tuple `(25399, \"Nov\", 2, 21, 25, \"exception_hierarchy.pdf\")`.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 3 (red green blue)
\n", "\n", "The file `src/rgb.txt` contains names of colors and their numerical representations in RGB format. The RBG format allows a color to be represented as a mixture of red, green, and blue components. Each component can have an integer value in the range [0,255]. Each line in the file contains four fields: red, green, blue, and colorname.\n", "Each field is separated by some amount of whitespace (tab or space in this case).\n", "The text file is formatted to make it print nicely, but that makes it harder to process by a computer. Note that some color names can also contain a space character.\n", " \n", "Write function `red_green_blue` that reads the file `rgb.txt` from the folder `src`. Remove the irrelevant first line of the file. The function should return a list of strings. Clean-up the file so that the strings in the returned list have four fields separated by a single tab character (`\\t`). Use regular expressions to do this.\n", "\n", "The first string in the returned list should be:\n", "```\n", "'255\\t250\\t250\\tsnow'\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 4 (word frequencies)
\n", "\n", "Create function `word_frequencies` that gets a filename as a parameter and returns a dict with the word frequencies. In the dictionary the keys are the words and the corresponding values are the number of times that word occurred in the file specified by the function parameter. Read all the lines from the file and split the lines into words using the `split()` method. Further, remove punctuation from the ends of words using the `strip(\"\"\"!\"#$%&'()*,-./:;?@[]_\"\"\")` method call.\n", "\n", "Test this function in the main function using the file `alice.txt`. In the output, there should be a word and its count per line separated by a tab:\n", "\n", "```\n", "The 64\n", "Project 83\n", "Gutenberg\t26\n", "EBook 3\n", "of 303\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 5 (summary)
\n", "\n", "This exercise can give two points at maximum!\n", "\n", "Part 1.\n", "\n", "Create a function called `summary` that gets a filename as a parameter. The input\n", "file should contain a floating point number on each line of the file. Make your function read these\n", "numbers and then return a triple containing the sum, average, and standard deviation of these numbers for the file.\n", "As a reminder, the formula for corrected sample standard deviation is\n", "$\\sqrt{\\frac{\\sum_{i=1}^n (x_i - \\overline x)^2}{n-1}}$, where $\\overline x$ is the average.\n", "\n", "The `main` function should call the function summary for each filename in the list `sys.argv[1:]` of command line parameters. (Skip `sys.argv[0]` since it contains the name of the current program.)\n", "\n", "Example of usage from the command line:\n", "`python3 src/summary.py src/example.txt src/example2.txt`\n", "\n", "Print floating point numbers using six decimals precision. The output should look like this:\n", "```\n", "File: src/example.txt Sum: 51.400000 Average: 10.280000 Stddev: 8.904606\n", "File: src/example2.txt Sum: 5446.200000 Average: 1815.400000 Stddev: 3124.294045\n", "```\n", "\n", "Part 2.\n", "\n", "If some line doesn’t represent a number, you can just ignore that line. You can achieve this with the *try-except* block. An example of recovering from an exceptional situation:\n", "```python\n", "try:\n", " x = float(line) # The float constructor raises ValueError exception if conversion is no possible\n", "except ValueError:\n", " # Statements in here are executed when the above conversion fails\n", "```\n", "We will cover more about exceptions later in the course.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 6 (file count)
\n", "\n", "This exercise can give two points at maximum!\n", "\n", "Part 1.\n", "\n", "Create a function `file_count` that gets a filename as parameter and returns a triple of numbers. The function should read the file, count the number of lines, words, and characters in the file, and return a triple with these count in this order. You get division into words by splitting at whitespace. You don't have to remove punctuation.\n", "\n", "Part 2.\n", "\n", "Create a main function that in a loop calls `file_count` using each filename in the list of command line parameters `sys.argv[1:]` as a parameter, in turn.\n", "For call `python3 src/file_count file1 file2 ...`\n", "the output should be\n", "```\n", "? ? ? file1\n", "? ? ? file2\n", "...\n", "```\n", "The fields are separated by tabs (`\\t`). The fields are in order: linecount, wordcount, charactercount, filename.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 7 (file extensions)
\n", "\n", "This exercise can give two points at maximum!\n", "\n", "Part 1.\n", "\n", "Write function `file_extensions` that gets as a parameter a filename.\n", "It should read through the lines from this file. Each line contains a filename.\n", "Find the extension for each filename. The function should return a pair, where the\n", "first element is a list containing all filenames with no extension (with the preceding period (`.`) removed).\n", "The second element of the pair is a dictionary with extensions as keys and corresponding values are lists with filenames having that extension.\n", "\n", "Sounds a bit complicated, but hopefully the next example will clarify this.\n", "If the file contains the following lines\n", "```\n", "file1.txt\n", "mydocument.pdf\n", "file2.txt\n", "archive.tar.gz\n", "test\n", "```\n", "then the return value should be the pair:\n", "`([\"test\"], { \"txt\" : [\"file1.txt\", \"file2.txt\"], \"pdf\" : [\"mydocument.pdf\"], \"gz\" : [\"archive.tar.gz\"] } )`\n", "\n", "Part 2.\n", "\n", "Write a `main` method that calls the `file_extensions` function with \"src/filenames.txt\" as the argument. Then print the results so that for each extension there is a line consisting of the extension and the number of files with that extension. The first line of the output should give the number of files without extensions.\n", "\n", "With the example in part 1, the output should be\n", "```\n", "1 files with no extension\n", "gz 1\n", "pdf 1\n", "txt 2\n", "```\n", "Had there been no filenames without extension then the first line would have been `0 files with no extension`. In the printout list the extensions in alphabetical order.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objects and classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python is an object-oriented programming language like Java\n", "and C++.\n", "But unlike Java, Python doesn’t force you to use classes,\n", "inheritance and methods.\n", "If you like, you can also choose the structural programming\n", "paradigm with functions and modules.\n", "\n", "Every value in Python is an object.\n", "Objects are a way to combine data and the functions that\n", "handle that data.\n", "This combination is called *encapsulation*.\n", "The data items and functions of objects are called *attributes*,\n", "and in particular the function attributes are called *methods*.\n", "For example, the operator `+` on integers calls a method of\n", "integers, and the operator `+` on strings calls a method of\n", "strings.\n", "\n", "Functions, modules, methods, classes, etc are all first class\n", "objects. This means that these objects can be\n", "\n", "* stored in a container\n", "* passed to a function as a parameter\n", "* returned by a function\n", "* bound to a variable\n", "\n", "One can access an attribute of an object using the *dot\n", "operator*: `object.attribute`.\n", "For example: if `L` is a list, we can refer to the method `append`\n", "with `L.append`. The method call can look, for instance, like\n", "this: `L.append(4)`.\n", "Because also modules are objects in Python, we can interpret\n", "the expression `math.pi` as accessing the data attribute `pi` of\n", "module object `math`.\n", "\n", "Numbers like 2 and 100 are instances of type `int`. Similarly,\n", "`\"hello\"` is an instance of type `str`.\n", "When we write `s=set()`, we are actually creating a new\n", "instance of type `set`, and bind the resulting instance object to\n", "`s`.\n", "\n", "A user can define his own data types.\n", "These are called *classes*.\n", "A user can call these classes like they were functions, and they\n", "return a new instance object of that type.\n", "Classes can be thought as recipes for creating objects.\n", "\n", "An example of class definition:\n", "```python\n", "class MyClass(object):\n", " \"\"\"Documentation string of the class\"\"\"\n", "\n", " def __init__(self, param1, param2):\n", " \"This initialises an instance of type ClassName\"\n", " self.b = param1 # creates an instance attribute\n", " c = param2 # creates a local variable of the function\n", " # statements ...\n", " \n", " def f(self, param1):\n", " \"\"\"This is a method of the class\"\"\"\n", " # some statements\n", " \n", " a=1 # This creates a class attribute\n", "```\n", "\n", "The class definition starts with the `class` statement.\n", "With this statement you give a name for your new type, and\n", "also in parentheses list the base classes of your class.\n", "The next indented block is the *class body*.\n", "After the whole class body is read, a new type is created.\n", "Note that no instances are created yet.\n", "All the attributes and methods of the class are defined in the\n", "class body.\n", "\n", "The example class has two methods: `__init__` and `f`.\n", "Note that their first parameter is special: `self`. It\n", "corresponds to `this` variable of C++ or Java.\n", "`__init__`\n", "does the initialisation when an instance is created.\n", "At instantiation with `i=MyClass(2,3)` the parameters\n", "`param1` and `param2` are bound to values 2 and 3, respectively.\n", "Now that we have an instance `i`, we can call its method `f`\n", "with the dot operator: `i.f(1)`.\n", "The parameters of `f` are bound in the following way:\n", "`self=i` and `param1=1`.\n", "\n", "There are differences in how an assignment inside a class body\n", "creates variables.\n", "The attribute `a` is at class level and is common for all\n", "instances of the class `MyClass`.\n", "The variable `c` is a local variable of the function `__init__`, and\n", "cannot therefore be used outside the function.\n", "The attribute `b` is specific to each instance of `MyClass`. Note\n", "that `self` refers to the current instance.\n", "An example: for objects `x=MyClass(1,0)` and\n", "`y=MyClass(2,0)` we have `x.b != y.b`, but `x.a == y.a`.\n", "\n", "All methods of a class have a mandatory first parameter which\n", "refers to the instance on which you called the method.\n", "This parameter is usually named `self`.\n", "If you want to access the class attribute `a` from a method of\n", "the class, use the fully qualified form `MyClass.a`.\n", "The methods whose names both begin and end with two\n", "underscores are called *special methods*. For example, `__init__`\n", "is a special method. These methods will be discussed in detail\n", "later.\n", "\n", "### Instances\n", "\n", "We can create instances by calling a class like it were a\n", "function: `i = ClassName(...)`.\n", "Then parameters given in the call will be passed to the\n", "`__init__` function.\n", "In the `__init__` method you can create the instance specific\n", "attributes.\n", "If `__init__` is missing, we can create an instance without\n", "giving any parameters. As a consequence, the instance has no\n", "attributes.\n", "Later you can (re)bind attributes with the assignment\n", "`instance.attribute = new value`.\n", "\n", "If that attribute did not exist before, it will be added to the\n", "instance with the assigned value.\n", "In Python we really can add or delete attributes to/from an\n", "existing instance.\n", "This is possible because the attribute names and the\n", "corresponding values are actually stored in a dictionary.\n", "This dictionary is also an attribute of the instance and is\n", "called `dict`.\n", "Another standard attribute in addition to dict is called\n", "`__class__`. This attribute stores the class of the instance.\n", "That is, the type of the object" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Attribute lookup\n", "\n", "Suppose `x` is an instance of class `X`, and we want to read an\n", "attribute `x.a`.\n", "The lookup has three phases:\n", "\n", "* First it is checked whether the attribute `a` is an attribute of\n", "the instance `x`\n", "* If not, then it is checked whether `a` is a class attribute of `x`’s\n", "class `X`\n", "* If not, then the base classes of `X` are checked\n", "\n", "If instead we want to bind the attribute `a`, things are much\n", "simpler.\n", "`x.a = value` will set the instance attribute.\n", "And `X.a = value` will set the class attribute.\n", "Note that if a base of `X`, the class `X`, and the instance `x` each\n", "have an attribute called `a`, then `x.a` hides `X.a`, and `X.a` hides\n", "the attribute of the base class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 8 (prepend)
\n", "\n", "Create a class called `Prepend`. We create an instance of the class by giving a string as a parameter\n", "to the initializer. The initializer stores the parameter in an instance attribute `start`. The class\n", "also has a method `write(s)` which prints the string `s` prepended with the `start` string.\n", "An example of usage:\n", "```python\n", "p = Prepend(\"+++ \")\n", "p.write(\"Hello\");\n", "```\n", "Will print\n", "```\n", "+++ Hello\n", "```\n", "\n", "Try out using the class from the `main` function.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inheritance\n", "\n", "Inheritance allows us to reuse the code of an existing class `B`\n", "in creating a new class `C`.\n", "Let’s recap how the attribute lookup worked for classes.\n", "When looking for an attribute, the lookup procedure starts\n", "with the instance dictionary, and continues with the class\n", "attributes.\n", "If both fail, then the attribute is searched from the base\n", "classes and, recursively, from their base classes.\n", "\n", "So, it may look like we access an attribute of a class `C`, when\n", "in reality we are accessing the attribute of its base class `B`.\n", "In this case we say that the class `C` *inherits* the attribute from\n", "its base class `B`.\n", "If we have attributes with the same name in both the class\n", "and its base class, the attribute of the base class is hidden.\n", "We say that the class `C` overrides the attribute of the base\n", "class `B`.\n", "Terminology: `B` is a base class and `C` is a derived class.\n", "\n", "Example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Executing B.f\n", "Executing C.g\n" ] } ], "source": [ "class B(object):\n", " def f(self):\n", " print(\"Executing B.f\")\n", " def g(self):\n", " print(\"Executing B.g\")\n", " \n", "class C(B):\n", " def g(self):\n", " print(\"Executing C.g\")\n", " \n", "x=C()\n", "x.f() # inherited from B\n", "x.g() # overridden by C" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A derived class is sometimes also called a *subclass* and the\n", "base class is called *super class*.\n", "The inheritance relation of two classes `B` and `C` can be tested\n", "with function `issubclass`:\n", "`issubclass(C,B)==True` but `issubclass(B,C)==False`\n", "Function `isinstance(obj, cls)` allows us to test whether\n", "an instance has type `cls` or has an ancestor class of type `cls`.\n", "Let’s create instances `x=C()` and `y=B()`.\n", "Now we have `isinstance(x,B)==\n", "isinstance(x,C)==isinstance(y,B)==True`.\n", "But `isinstance(y,C)==False`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![inheritance hierarchy](inheritance_hierarchy.svg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`object` should be a base class or an ancestor class of every\n", "other class.\n", "This means that `isinstance(x, object)==True` for all\n", "instances `x`.\n", "\n", "By deriving from an existing class we can modify and/or\n", "extend its behaviour, without touching the original class.\n", "For example, if we want to add one method to a list class,\n", "we can use inheritance. Therefore we have to only code the\n", "part that has changed and reuse the rest of the code of type\n", "list.\n", "Another use of inheritance is to create conceptual hierarchies.\n", "For instance, later we will learn about the exception hierarchy\n", "of Python.\n", "Third use would be to use classes to create interfaces. There\n", "can be several classes that have same interface (that is, they\n", "offer the same attributes), but their behaviour or\n", "implementation can be very different. This allows changing a\n", "part of your program with minimal changes required elsewhere\n", "in the code.\n", "\n", "If in the definition of the method `C.f` we need to call the\n", "corresponding method of class `A`, we can use the fully qualified\n", "call `A.f(...)`.\n", "This is called delegation.\n", "It is useful, for instance, when you want to call the init\n", "method of the base class from the init of the derived\n", "class to initialise the base class attributes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Special methods\n", "\n", "We have already encountered one special method, namely the\n", "`__init__` method.\n", "This method sets the instance attributes to some initial value.\n", "Its first parameter is `self`, and the subsequent parameters\n", "are the ones that were passed to the call of the class.\n", "The `__init__` method should return no value.\n", "Next the main general purpose special\n", "methods are introduced.\n", "They are executed when certain operations on objects are\n", "performed.\n", "\n", "In the following, `C` is a class and `x` and `y` are its instances.\n", "`__hash__` returns an int value, with the following\n", "requirement: `x==y` implies `x.__hash__() == y.__hash__()`.\n", "The value is used in storing objects in dictionaries and sets.\n", "The instances `x` and `y` must be immutable\n", "A class with `__call__` method makes its instances callable.\n", "I.e. the call `x(a,b, ...)` will result in calling this special\n", "method with the given parameters.\n", "The method `__del__` gets called when the corresponding\n", "instance gets deleted.\n", "Method `__new__` is used to control the creation of new\n", "instances. It can be used, for example, to create classes that\n", "have only one instance.\n", "\n", "The method `__str__` is called when the print statement needs\n", "to print the value of an instance. It returns a string. The\n", "print-format expression calls this for conversion `%s`.\n", "The method `__repr__` is called when the interactive interpreter\n", "prints the value of an evaluated expression, and when the\n", "conversion `%r` for print-format expression is used. Returns a\n", "canonical representation string that (at least in theory) can be\n", "used to recreate the original object.\n", "Special methods `__eq__`, `__ge__`, `__gt__`, `__le__`, `__lt__`, and\n", "`__ne__` get called when the corresponding operators `x==y`,\n", "`x>=y`, `x>y`, `x<=y`, `xExercise 9 (rational)\n", "\n", "\n", "Create a class `Rational` whose instances are rational numbers. A new rational number can be\n", "created with the call to the class. For example, the call `r=Rational(1,4)` creates a rational\n", "number “one quarter”. Make the instances support the following operations:\n", "`+` `-` `*` `/` `<` `>` `==`\n", "with their natural behaviour. Make the rationals also printable so that from the printout we can\n", "clearly see that they are rational numbers.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exceptions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When an error occurs, what can we do?\n", "\n", "* Print an error message\n", "* Stop the execution of a program\n", "* Indicate the error by returning a special value, like -1 or None\n", "* Ignore the error\n", "* ...\n", "\n", "These solutions tend to combine the indication of a problem\n", "and the reaction to the problem indication.\n", "The behaviour of the program in error situations cannot the\n", "changed, they are fixed in the implementation of the function.\n", "When an erroneous situation is noticed, it may not be clear\n", "how to handle the situation.\n", "Usually the user or an instance that called a function knows\n", "what to do.\n", "\n", "Most modern computer languages have a system called\n", "*exception handling*. This system separates the recognition of errors and the\n", "handling of these situations. We can signal an error or anomalous situation by *raising* an\n", "exception. Exceptions can be raised in Python with the `raise` statement:\n", "\n", "* `raise` instance\n", "* `raise` exception class [, expression]\n", "\n", "In the second form, if the expression exists, it is a tuple of\n", "parameters given to exception class.\n", "\n", "The functions of the Python standard library raise exceptions\n", "in error situations. Sometimes exceptions aren’t really errors. For example, when\n", "an iterator runs out of elements, it will signal this by raising\n", "the `StopIteration` exception.\n", "Another less erroneus exception is the `Warning` exception.\n", "\n", "The general form of exception catching statement is the following:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "try:\n", " # here are the statements that can cause exceptions\n", "except (Exceptionname1, Exceptionname2, ...):\n", " # here we handle the exceptions\n", "else:\n", " # this gets executed if try-block caused no exceptions\n", "finally:\n", " # this is always executed, clean-up code\n", "```\n", "\n", "Usually, just the try and except parts are needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index does not exist\n" ] } ], "source": [ "L=[1,2,3]\n", "try:\n", " print(L[3])\n", "except IndexError:\n", " print(\"Index does not exist\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Give a number (non-number quits): 3\n", "Give a number (non-number quits): x\n", "Average is 3.0\n" ] } ], "source": [ "def compute_average(L):\n", " n=len(L)\n", " s=sum(L)\n", " return float(s)/n # error is noticed here !!!\n", "mylist=[]\n", "while True:\n", " try:\n", " x=float(input(\"Give a number (non-number quits): \"))\n", " mylist.append(x)\n", " except ValueError:\n", " break\n", "try:\n", " average=compute_average(mylist)\n", " print(\"Average is\", average)\n", "except ZeroDivisionError:\n", " # and the error is handled here\n", " if len(mylist) == 0:\n", " print(\"Tried to compute the average of empty list of numbers\")\n", " else:\n", " print(\"Something strange happened\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exception hierarchy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python exceptions are objects, like all values in Python.\n", "These objects are instantiated from exception classes.\n", "Exception classes form naturally hierarchies:\n", "\n", "* New exception classes can be made by inheriting from existing exception classes and extending them\n", "* The root of this hierarchy is the class `Exception`\n", "* Python defines several base classes to derive from, and several ready-to-use exception classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![exception hierarchy](exception_hierarchy.svg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Too general exception specifications" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The exception hierarchy allows to catch multiple similar\n", "exceptions by catching their common base class.\n", "This feature has to be used carefully. Over-general exception\n", "specification, like `except Exception:`, can hide the real\n", "reason for an error. Example of this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-06-13T15:45:47.719228Z", "start_time": "2019-06-13T15:45:45.198519Z" } }, "outputs": [], "source": [ "import sys\n", "s=input(\"Give a number: \")\n", "s=s[:-1] # strip the \\n character from the end\n", "try:\n", " x=int(s)\n", " sys.stdout.wr1te(f\"You entered {x}\\n\")\n", "except Exception:\n", " print(\"You didn’t enter a number\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous example, if the user doesn’t enter a string that\n", "represents an integer, a `ValueError` is raised by the `int`\n", "function. Instead of catching the `ValueError`, we catch the root of the\n", "exception hierarchy, namely `Exception`. This results in catching all possible exceptions.\n", "But this will cause one typing error in the program to go undetected.\n", "Change the exception specification from `Exception` to `ValueError` to see what this error is." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is the error handling policy in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python uses a different approach to error checking than many\n", "other common languages.\n", "Instead of trying to beforehand check that all the inputs are of\n", "correct type and then contents of input variables are sensible\n", "for some operations, Python first tries the operations and then\n", "checks whether they caused any exceptions.\n", "This is partly what duck typing is about: a function works for\n", "a set of inputs if all the operations in the function body make\n", "sense for those inputs.\n", "So, that’s why the parameters of functions aren’t specified to\n", "be of any certain type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
Exercise 10 (extract numbers)
\n", "\n", "Write a function `extract_numbers` that gets a string as a parameter. It should return a list of numbers that can be both ints and floats. Split the string to words at whitespace using the `split()` method. Then iterate through each word, and initially try to convert to an int. If unsuccesful, then try to convert to a float. If not a number then skip the word.\n", "\n", "Example run:\n", "`print(extract_numbers(\"abd 123 1.2 test 13.2 -1\"))`\n", "will return\n", "`[123, 1.2, 13.2, -1]`\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sequences, iterables, generators: revisited\n", "\n", "In simple terms, a container is *iterable*, if we can go through all its elements using a `for` loop. All the sequences are iterable, but there are other iterable objects as well. We can even create iterable types ourselves. In our class there needs to be a special method `__iter__` that returns an *iterator* for the container. An iterator is an object that has method `__next__`, which returns the next element from the container. Let's have a look at a simple example where the container and its iterator is the same class." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class WeekdayIterator(object):\n", " \"\"\"Iterator over the weekdays.\"\"\"\n", " def __init__(self):\n", " self.i=0 # Start from Monday\n", " self.weekdays = (\"Monday\",\"Tuesday\",\"Wednesday\",\"Thursday\",\"Friday\",\"Saturday\",\"Sunday\")\n", " def __iter__(self): # If this object were a container, then this method would return the iterator over the \n", " # elements of the container.\n", " return self # However, this object is already an iterator, hence we return self.\n", " def __next__(self): # Returns the next weekday\n", " if self.i == 7: \n", " raise StopIteration # Signal that all weekdays were already iterated over\n", " else:\n", " weekday = self.weekdays[self.i]\n", " self.i += 1\n", " return weekday\n", " \n", "for w in WeekdayIterator():\n", " print(w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now check whether the WeekdayIterator is a Sequence type:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections import abc # Get the abstract base classes\n", "containers = [\"efg\", [1,2,3], (4,5), WeekdayIterator()]\n", "for c in containers:\n", " if isinstance(c, abc.Sequence):\n", " print(c, \"is a sequence\")\n", " else:\n", " print(c, \"is not a sequence\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Weekday is not a sequence because, for instance, you cannot index it with the brackets `[]`, but it is an iterable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "isinstance(WeekdayIterator(), abc.Iterable)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So it is possible to create iterators ourselves, but the syntax was quite complicated. There is an easier option using *generators*. A generator is a function that contains a `yield` statement. Note the difference between generators and generator expressions we saw in the first week. Both however produce iterables. Here's an example of a generator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-06-13T15:53:43.557681Z", "start_time": "2019-06-13T15:53:43.547036Z" } }, "outputs": [], "source": [ "def mydate(day=1, month=1): # Generates dates starting from the given date\n", " lengths=(31,28,31,30,31,30,31,31,30,31,30,31) # How many days in a month\n", " first_day=day\n", " for m in range(month, 13):\n", " for d in range(first_day, lengths[m-1] + 1):\n", " yield (d, m)\n", " first_day=1\n", "# Create the generator by calling the function: \n", "gen = mydate(26, 2) # Start from 26th of February\n", "for i, (day, month) in enumerate(gen): \n", " if i == 5: break # Print only the first five dates from the generator\n", " print(f\"Index {i}, day {day}, month {month}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that it would not be possible to write the above iterable using a generator expression, and it would have been very clumsy to explicitly write it as an iterator like we did the `WeekdayIterator`. The below figure shows the relationships between different iterables we have seen:\n", "\n", "![iterables.svg](iterables.svg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\"Open\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }