一般对于连续的值进行预测,使用回归的方式。
回归分为线性回归和逻辑回归,这里介绍比较简单的线性回归。
线性回归认为有一个函数 h(x) = ti*xi+b (其中xi为一个数据的第i个特征)可以使所有数据均围绕这个函数曲线而达到使用函数预测数据的目的。
梯度下降法是一种学习率(learning_rate)不变的回归方式,他每次迭代根据学习率不同对每个参数进行一定的更改以达到最贴近数据的参数。
我们使用的cost函数是: 即度量预测函数和真实数据之间差异的函数。迭代的目标就是使其最小。
迭代过程直接偷一个图:
这里简单实现了整个过程:可以通过最高迭代次数或者cost变化量来决定何时停止迭代:
import numpy as np
import random
class XYLinearRegression:
def __init__(self, x_arr, y_arr, learning_rate=0.01, min_cost=0, max_iter=100):
self.x_arr = x_arr
self.y_arr = y_arr
self.h_num = len(x_arr[0]) # 估计的参数个数
self.learning_rate = learning_rate
self.min_cost = min_cost
self.max_iter = max_iter
self.min_max_pair = [(min(i), max(i)) for i in np.array(x_arr).transpose()]
self.thata = [random.randrange(i[0], i[1]) for i in self.min_max_pair]
self.thata.append(0)
self.data_len = len(x_arr)
def linear_regression(self):
last_j = self.min_max_pair[0][0] + 1
iter = 0
cost = self.calc_cost()
while iter < self.max_iter or last_j - cost > self.min_cost:
last_j = cost
self.thata = self.get_thata()
cost = self.calc_cost()
print(self.thata)
print('Cost:' + str(cost))
iter += 1
print('max Iter!')
print(self.thata)
def calc_cost(self, no_square=False, i=0):
m = self.data_len
result = 0
for x, y in zip(self.x_arr, self.y_arr):
sum = 0
for i in range(self.h_num):
sum += x[i] * self.thata[i]
sum += self.thata[self.h_num]
sum -= y
if no_square is False:
sum *= sum
else:
if i is not -1:
sum *= x[i]
result += sum
result /= float(m * 2)
if no_square is True:
result *= 2
return result
def get_thata(self):
curr_thata = self.thata
for i in range(self.h_num):
t_cost = self.calc_cost(no_square=True, i=i)
curr_thata[i] -= self.learning_rate * t_cost
t_cost = self.calc_cost(no_square=True, i=-1)
curr_thata[self.h_num] -= self.learning_rate * t_cost
return curr_thata
# x_arr = [[random.randrange(100) for i in range(3)] for j in range(50)]
# y_arr = [random.randrange(50) for i in range(50)]
x_arr = [
[0], [1], [2], [3], [4], [5], [6],[7],[8],[9]
]
y_arr = [
0, 1, 2, 3, 4.3, 4.7, 6,7,8,9
]
xy = XYLinearRegression(x_arr, y_arr,max_iter=10000,min_cost=0.01,learning_rate=0.01)
xy.linear_regression()