You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

#### 23 KiB Raw Permalink Blame History

jupytext kernelspec
[{text_representation [{extension .md} {format_name myst} {format_version 0.13} {jupytext_version 1.11.1}]}] [{display_name Python 3} {language python} {name python3}]

# Determining Moore's Law with real data in NumPy

The number of transistors reported per a given chip plotted on a log scale in the y axis with the date of introduction on the linear scale x-axis. The blue data points are from a transistor count table. The red line is an ordinary least squares prediction and the orange line is Moore's law.

## What you'll do

In 1965, engineer Gordon Moore predicted that transistors on a chip would double every two years in the coming decade [1, 2]. You'll compare Moore's prediction against actual transistor counts in the 53 years following his prediction. You will determine the best-fit constants to describe the exponential growth of transistors on semiconductors compared to Moore's Law.

## Skills you'll learn

• Load data from a *.csv file
• Perform linear regression and predict exponential growth using ordinary least squares
• You'll compare exponential growth constants between models
• Share your analysis in a file:
• as NumPy zipped files *.npz
• as a *.csv file
• Assess the amazing progress semiconductor manufacturers have made in the last five decades

## What you'll need

1. These packages:

imported with the following commands

import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm


2. Since this is an exponential growth law you need a little background in doing math with natural logs and exponentials.

You'll use these NumPy, Matplotlib, and statsmodels functions:

+++

## Building Moore's law as an exponential function

Your empirical model assumes that the number of transistors per semiconductor follows an exponential growth,

$\log(\text{transistor_count})= f(\text{year}) = A\cdot \text{year}+B,$

where $A$ and $B$ are fitting constants. You use semiconductor manufacturers' data to find the fitting constants.

You determine these constants for Moore's law by specifying the rate for added transistors, 2, and giving an initial number of transistors for a given year.

You state Moore's law in an exponential form as follows,

$\text{transistor_count}= e^{A_M\cdot \text{year} +B_M}.$

Where $A_M$ and $B_M$ are constants that double the number of transistors every two years and start at 2250 transistors in 1971,

1. $\dfrac{\text{transistor_count}(\text{year} +2)}{\text{transistor_count}(\text{year})} = 2 = \dfrac{e^{B_M}e^{A_M \text{year} + 2A_M}}{e^{B_M}e^{A_M \text{year}}} = e^{2A_M} \rightarrow A_M = \frac{\log(2)}{2}$

2. $\log(2250) = \frac{\log(2)}{2}\cdot 1971 + B_M \rightarrow B_M = \log(2250)-\frac{\log(2)}{2}\cdot 1971$

so Moore's law stated as an exponential function is

$\log(\text{transistor_count})= A_M\cdot \text{year}+B_M,$

where

$A_M=0.3466$

$B_M=-675.4$

Since the function represents Moore's law, define it as a Python function using lambda

A_M = np.log(2) / 2
B_M = np.log(2250) - A_M * 1971
Moores_law = lambda year: np.exp(B_M) * np.exp(A_M * year)


In 1971, there were 2250 transistors on the Intel 4004 chip. Use Moores_law to check the number of semiconductors Gordon Moore would expect in 1973.

ML_1971 = Moores_law(1971)
ML_1973 = Moores_law(1973)
print("In 1973, G. Moore expects {:.0f} transistors on Intels chips".format(ML_1973))
print("This is x{:.2f} more transistors than 1971".format(ML_1973 / ML_1971))


Now, make a prediction based upon the historical data for semiconductors per chip. The Transistor Count [4] each year is in the transistor_data.csv file. Before loading a *.csv file into a NumPy array, its a good idea to inspect the structure of the file first. Then, locate the columns of interest and save them to a variable. Save two columns of the file to the array, data.

Here, print out the first 10 rows of transistor_data.csv. The columns are

Processor MOS transistor count Date of Introduction Designer MOSprocess Area
Intel 4004 (4-bit 16-pin) 2250 1971 Intel "10,000 nm" 12 mm²
... ... ... ... ... ...
! head transistor_data.csv


You don't need the columns that specify Processor, Designer, MOSprocess, or Area. That leaves the second and third columns, MOS transistor count and Date of Introduction, respectively.

Next, you load these two columns into a NumPy array using np.loadtxt. The extra options below will put the data in the desired format:

• delimiter = ',': specify delimeter as a comma ',' (this is the default behavior)
• usecols = [1,2]: import the second and third columns from the csv
• skiprows = 1: do not use the first row, because its a header row
data = np.loadtxt("transistor_data.csv", delimiter=",", usecols=[1, 2], skiprows=1)


You loaded the entire history of semiconducting into a NumPy array named data. The first column is the MOS transistor count and the second column is the Date of Introduction in a four-digit year.

Next, make the data easier to read and manage by assigning the two columns to variables, year and transistor_count. Print out the first 10 values by slicing the year and transistor_count arrays with [:10]. Print these values out to check that you have the saved the data to the correct variables.

year = data[:, 1]  # grab the second column and assign
transistor_count = data[:, 0]  # grab the first column and assign

print("year:\t\t", year[:10])
print("trans. cnt:\t", transistor_count[:10])


You are creating a function that predicts the transistor count given a year. You have an independent variable, year, and a dependent variable, transistor_count. Transform the independent variable to log-scale,

$y_i = \log($ transistor_count[i] $),$

resulting in a linear equation,

$y_i = A\cdot \text{year} +B$.

yi = np.log(transistor_count)


## Calculating the historical growth curve for transistors

Your model assume that yi is a function of year. Now, find the best-fit model that minimizes the difference between $y_i$ and $A\cdot \text{year} +B,$ as such

$\min \sum|y_i - (A\cdot \text{year}_i + B)|^2.$

This sum of squares error can be succinctly represented as arrays as such

$\sum|\mathbf{y}-\mathbf{Z} [A,~B]^T|^2,$

where $\mathbf{y}$ are the observations of the log of the number of transistors in a 1D array and $\mathbf{Z}=[\text{year}_i^1,~\text{year}_i^0]$ are the polynomial terms for $\text{year}_i$ in the first and second columns. By creating this set of regressors in the $\mathbf{Z}-$matrix you set up an ordinary least squares statistical model. Some clever NumPy array features will build $\mathbf{Z}$

1. year[:,np.newaxis] : takes the 1D array with shape (179,) and turns it into a 2D column vector with shape (179,1)
2. **[1, 0] : stacks two columns, in the first column is year**1 and the second column is year**0 == 1
Z = year[:, np.newaxis] ** [1, 0]


Now that you have the created a matrix of regressors, $\mathbf{Z},$ and the observations are in vector, $\mathbf{y},$ you can use these variables to build the an ordinary least squares model with sm.OLS.

model = sm.OLS(yi, Z)


Now, you can view the fitting constants, $A$ and $B$, and their standard errors. Run the fit and print the summary to view results as such,

results = model.fit()
print(results.summary())


The OLS Regression Results summary gives a lot of information about the regressors, $\mathbf{Z},$ and observations, $\mathbf{y}.$ The most important outputs for your current analysis are

=================================
coef    std err
---------------------------------
x1             0.3416      0.006
const       -666.3264     11.890
=================================


where x1 is slope, $A=0.3416$, const is the intercept, $B=-666.364$, and std error gives the precision of constants $A=0.342\pm 0.006~\dfrac{\log(\text{transistors}/\text{chip})}{\text{years}}$ and $B=-666\pm 12~\log(\text{transistors}/\text{chip}),$ where the units are in $\log(\text{transistors}/\text{chip})$. You created an exponential growth model. To get the constants, save them to an array AB with results.params and assign $A$ and $B$ to x1 and constant.

AB = results.params
A = AB[0]
B = AB[1]


Did manufacturers double the transistor count every two years? You have the final formula,

$\dfrac{\text{transistor_count}(\text{year} +2)}{\text{transistor_count}(\text{year})} = xFactor = \dfrac{e^{B}e^{A( \text{year} + 2)}}{e^{B}e^{A \text{year}}} = e^{2A}$

where increase in number of transistors is $xFactor,$ number of years is 2, and $A$ is the best fit slope on the semilog function. The error in your prediction, $\Delta(xFactor),$ comes from the precision of your constant $A,$ which you calculated as the standard error $\Delta A= 0.006$.

$\Delta (xFactor) = \frac{\partial}{\partial A}(e^{2A})\Delta A = 2Ae^{2A}\Delta A$

print("Rate of semiconductors added on a chip every 2 years:")
print(
"\tx{:.2f} +/- {:.2f} semiconductors per chip".format(
np.exp((A) * 2), 2 * A * np.exp(2 * A) * 0.006
)
)


Based upon your least-squares regression model, the number of semiconductors per chip increased by a factor of $1.98\pm 0.01$ every two years. You have a model that predicts the number of semiconductors each year. Now compare your model to the actual manufacturing reports. Plot the linear regression results and all of the transistor counts.

Here, use plt.semilogy to plot the number of transistors on a log-scale and the year on a linear scale. You have defined a three arrays to get to a final model

$y_i = \log(\text{transistor_count}),$

$y_i = A \cdot \text{year} + B,$

and

$\log(\text{transistor_count}) = A\cdot \text{year} + B,$

your variables, transistor_count, year, and yi all have the same dimensions, (179,). NumPy arrays need the same dimensions to make a plot. The predicted number of transistors is now

$\text{transistor_count}_{\text{predicted}} = e^Be^{A\cdot \text{year}}$.

+++

In the next plot, use the fivethirtyeight style sheet. The style sheet replicates https://fivethirtyeight.com elements. Change the matplotlib style with plt.style.use.

transistor_count_predicted = np.exp(B) * np.exp(A * year)
transistor_Moores_law = Moores_law(year)
plt.style.use("fivethirtyeight")
plt.semilogy(year, transistor_count, "s", label="MOS transistor count")
plt.semilogy(year, transistor_count_predicted, label="linear regression")

plt.plot(year, transistor_Moores_law, label="Moore's Law")
plt.title(
"MOS transistor count per microprocessor\n"
+ "every two years \n"
+ "Transistor count was x{:.2f} higher".format(np.exp(A * 2))
)
plt.xlabel("year introduced")
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5))
plt.ylabel("# of transistors\nper microprocessor")


A scatter plot of MOS transistor count per microprocessor every two years with a red line for the ordinary least squares prediction and an orange line for Moore's law.

The linear regression captures the increase in the number of transistors per semiconductors each year. In 2015, semiconductor manufacturers claimed they could not keep up with Moore's law anymore. Your analysis shows that since 1971, the average increase in transistor count was x1.98 every 2 years, but Gordon Moore predicted it would be x2 every 2 years. That is an amazing prediction.

Consider the year 2017. Compare the data to your linear regression model and Gordon Moore's prediction. First, get the transistor counts from the year 2017. You can do this with a Boolean comparator,

year == 2017.

Then, make a prediction for 2017 with Moores_law defined above and plugging in your best fit constants into your function

$\text{transistor_count} = e^{B}e^{A\cdot \text{year}}$.

+++