Hi ! This is Data100's first discussion

Happy first day of school !

Discussion 0

This is the first (or 0th) discussion of Data 100. It covers basic Python, calculus, and probability concepts.

Agenda

  1. Introduction
  2. Logistics & Jupyterhub Demo
  3. Calculus Question (Pen and Paper)
  4. Probability Question (Pen and Paper)
  5. Python🐍 (Laptop)

Introduction

I'm Simon

  • 3rd year CS student
  • 2nd time TA this class.
  • I do research @RISELab, interested in system and tool for mahcine learning

What about you?

  • Name
  • Year
  • What part of Data Science do you find interesting?
  • What was your breakfast this morning?

Logistics

  • HW0 Released! Due in <2 weeks. All pre-req.
  • No OH (finding room ...), post any question on piazza
  • Website @ ds100.org/fa18
  • First lecture tomorrow, Josh will be teaching it

Jupyterhub

Any Question?

Next, we will dive into the discussion questions

Question 1

For this question, write your answer on paper.

Part A

Find the value of $x$ that minimizes the function $f(x) = x^2 + 4x + 4$.

In [ ]:
# Let's plot it
x = np.arange(-8, 6)
sns.lineplot(x=x, y=x**2 + 4*x + 4)
plt.title(r"$f(x) = x^2 + 4x + 4$");
In [ ]:
# Let's plot it with derivative
x = np.arange(-8, 6)
sns.lineplot(x=x, y=x**2 + 4*x + 4, label="f(x)")
plt.axhline(0, color='red', label="y=0")
sns.lineplot(x=x, y=2*x+4, label="df(x)")
plt.title(r"$f(x) = x^2 + 4x + 4$");

We first set the derivative equal to 0 and solve for $x$.

$$\frac{d}{dx} f(x) = 2x + 4$$

$$2x + 4 = 0$$

$$x = -2$$

We make sure the second derivative is positive (shows that the curve is convex) to verify that this is indeed a minimum:

$$\frac{d^2}{dx^2} f(x) = 2$$

Since the second derivative is positive, we know that $x = -2$ is a minimum.

Part B

Calculate the partial derivative of the following expression with respect to $x$.

$$f(x, y) = xy + \sin(x^2y) + \ln(x^3y)$$

Note: The partial derivative of a function with respect to a variable x involves taking the derivative of the function with respect to x while holding all other variables constant.

$$f(x, y) = xy + \sin(x^2y) + \ln(x^3y)$$

$$f(x, y) = xy + \sin(x^2y) + 3\ln(x) + \ln(y)$$

$$\frac{\partial}{\partial x} f(x, y) = y + 2xy\cos(x^2y) + \frac{3}{x}$$

Part C

$$f(x, y) = xy + \sin(x^2y) + \ln(x^3y)$$

Now, calculate the partial derivative of the experssion with respect to $y$.

$$f(x, y) = xy + \sin(x^2y) + 3\ln(x) + \ln(y)$$

$$\frac{\partial}{\partial y} f(x, y) = x + x^2\cos(x^2y) + \frac{1}{y}$$

Question 2

For this question, write your answer on paper.

Are you smarter than a doctor? ONLY 46% OF DOCTORS GOT THIS QUESTION RIGHT.

100 out of 10,000 women at age forty who participate in routine screening have breast cancer. 80 of those 100 women with breast cancer test positive. 950 out of 9,900 women without breast cancer also test positive. If 10,000 women in this age group undergo a routine screening, about what fraction of these women with positive tests will actually have breast cancer?

  • 100 / 10000 have breast cancer
    • 80 / 100 have breast cancer & test positive
    • 950 / 9900 without breast cancer & test positive

We want to know the fraction of these women with positive tests will actually have breast cancer.

$$ \frac{\text{Have breast cancer & test positive}}{\text{Test positive}} $$

$$ \text{Test Positive} = 80 + 950 $$

$$ \text{Have breast cancer & test positive} = 80 $$

$$ \frac{\text{Have breast cancer & test positive}}{\text{Test positive}} = \frac{80}{80+950} = 7.8\% $$

Question 3

Suppose we have the following list lst and array arr.

In [ ]:
import numpy as np
arr = np.arange(5)
lst = list(range(5))
In [ ]:
arr
In [ ]:
lst

What will be the output of the following lines of Python/Numpy code? Try to predict the output before running the cell.

In [ ]:
arr + 5
In [ ]:
lst + 5
In [ ]:
arr * 2
In [ ]:
lst * 2

Question 4

Note: The line raise NotImplementedError() indicates that the implementation still needs to be added. This is an exception derived from RuntimeError. Please comment out that line when you have implemented the function.

Part A

Write a function that returns a list of numbers such that $ x_i=i^2 $ for $1\leq i \leq n$. Don't worry about the case where $n \leq 0$.

In [ ]:
def squares(n):
    """Compute the squares of numbers from 1 to n such that the ith element of the returned list equals i^2."""
    raise NotImplementedError()

Your function should print [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] for $n=10$.
Check that it does:

In [ ]:
squares(10)

Check that squares returns the correct output for several inputs.

In [ ]:
assert squares(1) == [1]
assert squares(2) == [1, 4]
assert squares(10) == [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
assert squares(11) == [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

Part B

Evaluate the following summation (you may use code):

$$\sum_{i=1}^{100} i^3 + 3 i^2$$

In [ ]:
q2b_sum = ...

Check that your sum is correct.

In [ ]:
assert q2b_sum == 26517550

Part C

Write a function map_func that will implement mapping and filtering on a list of values list_vals. This function will also take in a parameter mapper, which takes an input x and maps that value to a new value of the same type. The last parameter filter_func takes an input y and returns a boolean value based on if y satifies a certain condtion. In short, map_func should return a list of values from n that satisfy the condition established by filter_func.

Note: If you want to see examples of map_func used, look at the tests below.

In [ ]:
def map_func(list_vals, mapper, filter_func):
    """
    Maps and filters the input list n based on the condition established by the input param filter_func. 
    Return a list containing elements of n that filter_func returns true on.
    """
    raise NotImplementedError()

Check that map_func returns the correct output for several inputs.

In [ ]:
assert map_func(
    list_vals = [1, 2, 3], 
    mapper = lambda x: x*2, 
    filter_func = lambda x: x > 1) == [4, 6]

assert map_func(
    [], 
    lambda x: x*1000, 
    lambda x: x > -10) == []

assert map_func(
    ["piglet", "gavin", "jim", "andy"], 
    lambda x: x[:0:-1], 
    lambda x: len(x) < 6) == ["niva", "mi", "ydn"]

Question 5

Part A

Write a function which takes in a string and returns True if the string is a palindrome. (A string is a palindrome if it is the same forwards and backwards.)

In [ ]:
def is_palindrome(word):
    """Return True if word is a palindrome."""
    raise NotImplementedError()

Your function should return true for "racecar".

In [ ]:
is_palindrome("racecar")

Check that the function works for several inputs.

In [ ]:
assert is_palindrome("aviddiva") == True
assert is_palindrome("clearlynotapalindrome") == False
assert is_palindrome("kayak") == True
assert is_palindrome("ab") == False
assert is_palindrome("abb") == False
assert is_palindrome("a") == True

Part B

Write a function that flattens a nested Python list.

In [ ]:
def flatten(lst):
    """Flattens the input list lst so that there are no nested lists."""
    raise NotImplementedError()

Check that the function works for several inputs.

In [ ]:
assert flatten([1, 2, 3]) == [1, 2, 3]
assert flatten([1, 2, [3, 4]]) == [1, 2, 3, 4]
assert flatten([1, 2, [3, [4]], 5]) == [1, 2, 3, 4, 5]
assert flatten([1, 2, [3, [4]], [5, [[6]]]]) == [1, 2, 3, 4, 5, 6]
assert flatten([1, 2, [3, [[4]]], [5, [[6]]], [[[7]]]]) == [1, 2, 3, 4, 5, 6, 7]

Although there is no built-in flatten function for Python lists, numpy arrays have a flatten method that can be used to flatten an array. For example, to flatten a numpy array arr we would call arr.flatten().

In [ ]:
nested = np.array([
    [1, 2],
    [2, 3]
])
nested
In [ ]:
nested.flatten()

That's all!

  • Any Question/Feedback/Concern?
  • xmo@berkeley.edu
  • ds100.org

Thanks for coming