本书中第一次介绍神经网络(单隐层前馈神经网络)
使用tanh为激活函数,类似的有 双极S曲线函数 等等
网络算法使用前馈法,即每个神经元接受前一层的所有输入并输出该神经元结果
训练方法为反向传播法,即使用真实数据(在这里为用户的点击选择)逆向计算(从输出到输入),将得到的各个神经元的差值作为补充计入该神经元的权重中
其中N 为Learning_rate
以下为神经网络部分的算法:
from math import tanh
from sqlite3 import dbapi2 as sqlite
def dtanh(y):
return 1.0 - y * y;
class searchnet:
con = None
wordids = None
hiddenids = None
urlids = None
def __init__(self, dbname):
self.con = sqlite.connect(dbname)
def __del__(self):
self.con.close()
def maketables(self):
self.con.execute('CREATE TABLE hiddennode(create_key)')
self.con.execute('CREATE TABLE wordhidden(fromid,toid,strength)')
self.con.execute('CREATE TABLE hiddenurl(fromid,toid,strength)')
self.con.commit()
def getstrength(self, fromid, toid, layer):
table = 'hiddenurl'
if layer == 0:
table = 'wordhidden'
res = self.con.execute(
'SELECT strength FROM ' + table + ' WHERE fromid=' + str(fromid) + ' AND toid=' + str(toid)).fetchone()
if res is None:
if layer == 0:
return -0.2
else:
return 0
return res[0]
def setstrength(self, fromid, toid, layer, strength):
table = 'hiddenurl'
if layer == 0:
table = 'wordhidden'
res = self.con.execute(
'SELECT rowid FROM {table} WHERE fromid={fromid} AND toid={toid}'.format(table=table, fromid=fromid,
toid=toid)).fetchone()
if res is None:
self.con.execute(
'INSERT INTO {table}(fromid,toid,strength) VALUES ({fromid},{toid},{strength})'.format(table=table,
fromid=fromid,
toid=toid,
strength=strength))
else:
rowid = res[0]
self.con.execute(
'UPDATE {table} SET strength={strength} WHERE rowid={rowid}'.format(table=table, rowid=rowid,
strength=strength))
def generatehiddennode(self, wordids, urls):
if len(wordids) > 3:
return None
createkey = '_'.join(sorted([str(wi) for wi in wordids]))
res = self.con.execute(
'SELECT rowid FROM hiddennode WHERE create_key={key}'.format(key="'" + createkey + "'")).fetchone()
if res is None:
cur = self.con.execute(
'INSERT INTO hiddennode (create_key) VALUES({key})'.format(key="'" + createkey + "'"))
hiddenid = cur.lastrowid
for wordid in wordids:
self.setstrength(wordid, hiddenid, 0, 1.0 / len(wordids))
for urlid in urls:
self.setstrength(hiddenid, urlid, 1, 0.1)
self.con.commit()
def gethidddenids(self, wordids, urlids):
l1 = {}
for wordid in wordids:
cur = self.con.execute('SELECT toid FROM wordhidden WHERE fromid={fromid}'.format(fromid=wordid))
for row in cur:
l1[row[0]] = 1
for urlid in urlids:
cur = self.con.execute('SELECT fromid FROM hiddenurl WHERE toid={toid}'.format(toid=urlid))
for row in cur:
l1[row[0]] = 1
return l1.keys()
ai = None
ah = None
ao = None
wi = None
wo = None
def setupnetwork(self, wordids, urlids):
# 值列表
self.wordids = wordids
self.hiddenids = self.gethidddenids(wordids, urlids)
self.urlids = urlids
# 节点输出
self.ai = [1.0] * len(self.wordids)
self.ah = [1.0] * len(self.hiddenids)
self.ao = [1.0] * len(self.urlids)
# 建立权重矩阵
self.wi = [[self.getstrength(wordid, hiddenid, 0) for hiddenid in self.hiddenids] for wordid in self.wordids]
self.wo = [[self.getstrength(hiddenid, urlid, 1) for urlid in self.urlids] for hiddenid in self.hiddenids]
def feedforward(self):
for i in range(len(self.wordids)):
self.ai[i] = 1.0
for j in range(len(self.hiddenids)):
sum = 0.0
for i in range(len(self.wordids)):
sum += self.ai[i] * self.wi[i][j]
self.ah[j] = tanh(sum)
for k in range(len(self.urlids)):
sum = 0.0
for j in range(len(self.hiddenids)):
sum += self.ah[j] * self.wo[j][k]
self.ao[k] = tanh(sum)
return self.ao[:]
def getresult(self, wordids, urlids):
self.setupnetwork(wordids, urlids)
return self.feedforward()
def back_propagate(self, targets, N=0.5):
# 计算输出误差
output_deltas = [0.0] * len(self.urlids)
for k in range(len(self.urlids)):
err = targets[k] - self.ao[k]
output_deltas[k] = dtanh(self.ao[k]) * err
# 计算hidden误差
hidden_deltqs = [0.0] * len(self.hiddenids)
for j in range(len(self.hiddenids)):
err = 0.0
for k in range(len(self.urlids)):
err += output_deltas[k] * self.wo[j][k]
hidden_deltqs[j] = dtanh(self.ah[j]) * err
# 更新输出权重
for j in range(len(self.hiddenids)):
for k in range(len(self.urlids)):
change = output_deltas[k] * self.ah[j]
self.wo[j][k] += N * change
# 更新输入权重
for i in range(len(self.wordids)):
for j in range(len(self.hiddenids)):
change = hidden_deltqs[j] * self.ai[i]
self.wi[i][j] += N * change
def updatedatabase(self):
for i in range(len(self.wordids)):
for j in range(len(self.hiddenids)):
self.setstrength(list(self.wordids)[i], list(self.hiddenids)[j], 0, self.wi[i][j])
for j in range(len(self.hiddenids)):
for k in range(len(self.urlids)):
self.setstrength(list(self.hiddenids)[j], list(self.urlids)[k], 1, self.wo[j][k])
self.con.commit()
def trainquery(self, wordids, urlids, selectedurl):
self.generatehiddennode(wordids, urlids)
self.setupnetwork(wordids, urlids)
self.feedforward()
targets = [0.0] * len(urlids)
targets[urlids.index(selectedurl)] = 1.0
self.back_propagate(targets)
self.updatedatabase()
为了训练这个神经网络
假设用户搜索的关键词为[A,B,C],用户点击的链接为[U]
那么使用
nn.trainquery([A, B, C],allurls,U)
对网络进行训练
进行的次数越多准确度越高,并在一定数量后趋于稳定
可以将其纳入估计方法之内,在“上”篇中 加入
def nnscore(self, rows, wordids):
urlids = [urlid for urlid in set([row[0] for row in rows])]
nnres = net.getresult(wordids, urlids)
score = dict([(urlids[i], nnres[i]) for i in range(len(urlids))])
return self.normalizescores(scores=score)
即可获得通过网络计算得到的标准化权重值