Bert

Bert November 17, 2022 less than 1 minute read

1. Bert简介

BERT是GOOGLE团队于2018年提出的模型，在NLP领域里有着举足轻重的地位，BERT开创了NLP领域的预训练时代。BERT其实就是Transformer的编码器。BERT的双向性体现在MLM任务上，该任务可以让模型在得到全局的信息之后再去做预测

2. 相关工作

在BERT之前，将预训练的自然语言表示应用到下游任务有两种方法，一种是feature-based，代表模型就是ELMo，另一种是finetune，代表模型是GPT。这两种模型都是基于语言模型实现损失函数，也就是单向的，Bert提出了用掩码语言模型来做损失函数，这样既可以学到从左往右的信息，也可以学到从右往左的信息

3. 模型架构

模型主体架构就是Transformer的编码器

3.1. 输入表示

由于BERT预训练的数据集很大，所以BERT的采用WordPiece的方法来获得词元，WordPiece常见的方法有BPE，这样处理后有词表大小为30000。除此之外，BERT还有特殊的词元<cls>和<sep>，分别表示序列的整体信息和序列对的信息，嵌入层:

Segment Embedding就是句段的信息，如果有两个句子，那么输入为0或者1

3.2. 预训练的Task

预训练任务有两个，掩码语言模型(MLM)和下一句预测(NSP)，MLM主要为了学习词元级别的信息，NSP为了学习序列对的信息，主要为了服务QA等任务。掩码语言模型就是随机替换或者掩蔽某一词元，具体来说，选择15%的词元来做掩蔽，10%的几率不替换，10%的几率替换成随机词元，80%的几率替换成[Mask]词元。

4. 预训练数据集和实验

预训练数据集为: BooksCorpus和English Wikipedia

做了很多很多实验，刷了很多榜，GLUE，SQuAD，SWAG，针对不同的问题，需要给模型加上输出层做微调，其中很多任务都使用特殊字符<cls>来做预测

Bert-Base有110M参数，Bert-Large有340M个参数

NLP

Bert November 17, 2022 less than 1 minute read

1. Bert简介

BERT是GOOGLE团队于2018年提出的模型，在NLP领域里有着举足轻重的地位，BERT开创了NLP领域的预训练时代。BERT其实就是Transformer的编码器。BERT的双向性体现在MLM任务上，该任务可以让模型在得到全局的信息之后再去做预测

2. 相关工作

在BERT之前，将预训练的自然语言表示应用到下游任务有两种方法，一种是feature-based，代表模型就是ELMo，另一种是finetune，代表模型是GPT。这两种模型都是基于语言模型实现损失函数，也就是单向的，Bert提出了用掩码语言模型来做损失函数，这样既可以学到从左往右的信息，也可以学到从右往左的信息

3. 模型架构

模型主体架构就是Transformer的编码器

3.1. 输入表示

由于BERT预训练的数据集很大，所以BERT的采用WordPiece的方法来获得词元，WordPiece常见的方法有BPE，这样处理后有词表大小为30000。除此之外，BERT还有特殊的词元<cls>和<sep>，分别表示序列的整体信息和序列对的信息，嵌入层:

Segment Embedding就是句段的信息，如果有两个句子，那么输入为0或者1

3.2. 预训练的Task

预训练任务有两个，掩码语言模型(MLM)和下一句预测(NSP)，MLM主要为了学习词元级别的信息，NSP为了学习序列对的信息，主要为了服务QA等任务。掩码语言模型就是随机替换或者掩蔽某一词元，具体来说，选择15%的词元来做掩蔽，10%的几率不替换，10%的几率替换成随机词元，80%的几率替换成[Mask]词元。

4. 预训练数据集和实验

预训练数据集为: BooksCorpus和English Wikipedia

做了很多很多实验，刷了很多榜，GLUE，SQuAD，SWAG，针对不同的问题，需要给模型加上输出层做微调，其中很多任务都使用特殊字符<cls>来做预测

Bert-Base有110M参数，Bert-Large有340M个参数

短文本自动评估论文阅读整理 October 10, 2022 16 minute read

1. 论文简介
2. Automatic Short-Answer Grading via BERT-Based Deep Neural Networks

Paper Reading

Bert November 17, 2022 less than 1 minute read

1. Bert简介

BERT是GOOGLE团队于2018年提出的模型，在NLP领域里有着举足轻重的地位，BERT开创了NLP领域的预训练时代。BERT其实就是Transformer的编码器。BERT的双向性体现在MLM任务上，该任务可以让模型在得到全局的信息之后再去做预测

2. 相关工作

在BERT之前，将预训练的自然语言表示应用到下游任务有两种方法，一种是feature-based，代表模型就是ELMo，另一种是finetune，代表模型是GPT。这两种模型都是基于语言模型实现损失函数，也就是单向的，Bert提出了用掩码语言模型来做损失函数，这样既可以学到从左往右的信息，也可以学到从右往左的信息

3. 模型架构

模型主体架构就是Transformer的编码器

3.1. 输入表示

由于BERT预训练的数据集很大，所以BERT的采用WordPiece的方法来获得词元，WordPiece常见的方法有BPE，这样处理后有词表大小为30000。除此之外，BERT还有特殊的词元<cls>和<sep>，分别表示序列的整体信息和序列对的信息，嵌入层:

Segment Embedding就是句段的信息，如果有两个句子，那么输入为0或者1

3.2. 预训练的Task

预训练任务有两个，掩码语言模型(MLM)和下一句预测(NSP)，MLM主要为了学习词元级别的信息，NSP为了学习序列对的信息，主要为了服务QA等任务。掩码语言模型就是随机替换或者掩蔽某一词元，具体来说，选择15%的词元来做掩蔽，10%的几率不替换，10%的几率替换成随机词元，80%的几率替换成[Mask]词元。

4. 预训练数据集和实验

预训练数据集为: BooksCorpus和English Wikipedia

做了很多很多实验，刷了很多榜，GLUE，SQuAD，SWAG，针对不同的问题，需要给模型加上输出层做微调，其中很多任务都使用特殊字符<cls>来做预测

Bert-Base有110M参数，Bert-Large有340M个参数

Transformer November 15, 2022 less than 1 minute read

1. Transformer简介

2017年Google团队提出的完全基于attention机制的神经网络结构，是一种全新的网络架构，影响力十分大，后续的BERT, ViT等多个熟知的预训练模型都是由Transformer变体而来，相关链接:

2. Introduction

最开始Transformer是为了解决机器翻译的任务，Transformer没出来前，解决机器翻译主要的手段是RNN和Encoder-Decoder架构，比如说Seq2Seq。RNN的缺点是依赖时间顺序，第一个方面是并行度比较低，第二个方面是如果时序比较长，那么前期的时序信息在后期可能会丢掉。Transformer的特点有: 自注意力、多头注意力。

3. 模型架构

模型预览图:

Transformer时序维度始终是512，BatchNorm和LayerNorm的区别: BN是针对一个batch的特征来做平均和方差，LN是针对batch中一个样本的所有特征做均值和方差。一般的Transformer能接受的序列长度最好不超过512，超过这个长度可能会导致硬件算不动

3.1. 注意力

注意力有三要素，query、key、value。对于每个query，计算与其他所有key的compatibility作为value的权重。Transformer使用的计算compatibility的方式是Scaled Dot-Product Attention，也是应用最多的计算方式:

\begin{equation} \text{Attention}(Q,K,V)=softmax(\frac{QK^T}{d_k})V \end{equation}

结合图片理解:

讲道理来说，对于时间步t的查询$q_t$而言，它不应该能看到时间步t后的序列，但是在计算attention的时候却可以看到所有的序列，所以需要有mask来保证序列在预测过程中看不见时间步t之后的序列

3.2. 多头注意力

多头注意力的出发点是让高维的序列能有h次学习机会，这样有机会能学习到更多的信息，假设有h个头，那么维度就需要投影到512/h，然后再做注意力，然后将h个头concatenate，这样就可以实现多头注意力

在decoder里，query是当前的输入，k和v都是encoder的输出

3.3. Position-wise Feed-Forward Networks

位置前馈神经网络，其实就是一个基于每个时间步的MLP，由于Transformer里每个时间步都做了attention，所以它包含了序列的信息，所以可以直接针对每个时间步做MLP

3.4. Embedding and softmax

encoder的embedding层和decoder的embedding以及decoder中softmax前的线性层是共享权重的，其中embedding层的权重需要乘以$\sqrt{d_{model}}$(为了保证数值相似)。embedding层最大的作用就是将vocab_size映射到模型的维度

3.5. Positional Encoding

位置编码，因为注意力机制其实是没有考虑序列的位置信息的，它计算的是query和key的距离，所以在输入模型前需要加上位置编码来保证序列的每一步都包含其位置信息，pos表示第几个单词，2i表示对于偶数列，2i+1表示奇数列

\begin{equation} \begin{aligned} \text{PE}_{(pos,2i)} & = sin(pos/10000^{2i/d_{model}}) \\ \text{PE}_{(pos,2i+1)} & = cos(pos/10000^{2i/d_{model}}) \end{aligned} \end{equation}

Transformer的位置编码的做法是利用sin和cos对不同的时间步进行编码，也就是说Transformer的做法是不训练位置编码，作者的解释是相较于训练的位置编码，两者结果相似。

4. 数据集和实验

Transformer解决的是机器翻译任务，数据集为WMT 2014 English-German dataset，分词的时候采取BPE，并且英语和德语共用一个词库，这样保证了encoder和decoer的embedding层一致

Vision Transformer November 14, 2022 less than 1 minute read

1. ViT简介

ViT的全称是Vision Transformer，模型最大的特点就是把NLP领域的Transformer迁移到CV领域，模型在图像识别等多个CV任务上表现超越了卷积神经网络，相关链接:

2. Introduction

限制Transformer在CV领域发挥的一个要素就是序列长度的问题，如果把图像的每个像素点看成token，那么对于一张中分辨率的图片224*224而言，序列长度为50k，Bert模型处理的序列长度只有512，所以将Transformer应用到CV领域的一个难点就是序列长度过长。ViT于是提出了将图片分割成16*16的patch的做法，这样就可以大大减少序列的长度，不改变Transformer架构的情况下直接应用，取得了很好的结果，说明了transformer确实能在CV领域能取得很好的效果。

3. 模型

模型预览图:

为了缩短序列的长度，作者将图片分为了很多个patch，patch的大小可以是16*16，每个patch的表示为每个像素点的灰度值排列，那么一张图片patch的形状为num_patchs*768，其中768=16*16*3。得到patch的表示后，输入线性投影层(就是一个线性层，形状可以是768*768)。为了和transformer保持一致，ViT也采用了位置编码和特殊编码<cls>的使用，位置编码采用的是1D位置编码，位置编码和特殊编码都可学习，patch的线性投影表示和位置编码是直接加在一起，这样便得到了图像的token，其余步骤和Bert类似，ViT的训练是有监督训练，用特殊编码<cls>来做预测

4.实验和数据集

数据集使用了ImageNet-1k、ImageNet-21k和Google自家的JFT数据集

和BERT一样，ViT也根据模型使用的参数和patch的大小不同，分为了以下几种:

比如ViT-L/16代表Large ViT with patch size 16*16

实验结果

实验的metrics是预训练模型微调后的accuracy

这张图表明ViT在数据集大的情况下会表现更好

Transformer

Transformer November 15, 2022 less than 1 minute read

1. Transformer简介

2017年Google团队提出的完全基于attention机制的神经网络结构，是一种全新的网络架构，影响力十分大，后续的BERT, ViT等多个熟知的预训练模型都是由Transformer变体而来，相关链接:

2. Introduction

最开始Transformer是为了解决机器翻译的任务，Transformer没出来前，解决机器翻译主要的手段是RNN和Encoder-Decoder架构，比如说Seq2Seq。RNN的缺点是依赖时间顺序，第一个方面是并行度比较低，第二个方面是如果时序比较长，那么前期的时序信息在后期可能会丢掉。Transformer的特点有: 自注意力、多头注意力。

3. 模型架构

模型预览图:

Transformer时序维度始终是512，BatchNorm和LayerNorm的区别: BN是针对一个batch的特征来做平均和方差，LN是针对batch中一个样本的所有特征做均值和方差。一般的Transformer能接受的序列长度最好不超过512，超过这个长度可能会导致硬件算不动

3.1. 注意力

注意力有三要素，query、key、value。对于每个query，计算与其他所有key的compatibility作为value的权重。Transformer使用的计算compatibility的方式是Scaled Dot-Product Attention，也是应用最多的计算方式:

\begin{equation} \text{Attention}(Q,K,V)=softmax(\frac{QK^T}{d_k})V \end{equation}

结合图片理解:

讲道理来说，对于时间步t的查询$q_t$而言，它不应该能看到时间步t后的序列，但是在计算attention的时候却可以看到所有的序列，所以需要有mask来保证序列在预测过程中看不见时间步t之后的序列

3.2. 多头注意力

多头注意力的出发点是让高维的序列能有h次学习机会，这样有机会能学习到更多的信息，假设有h个头，那么维度就需要投影到512/h，然后再做注意力，然后将h个头concatenate，这样就可以实现多头注意力

在decoder里，query是当前的输入，k和v都是encoder的输出

3.3. Position-wise Feed-Forward Networks

位置前馈神经网络，其实就是一个基于每个时间步的MLP，由于Transformer里每个时间步都做了attention，所以它包含了序列的信息，所以可以直接针对每个时间步做MLP

3.4. Embedding and softmax

encoder的embedding层和decoder的embedding以及decoder中softmax前的线性层是共享权重的，其中embedding层的权重需要乘以$\sqrt{d_{model}}$(为了保证数值相似)。embedding层最大的作用就是将vocab_size映射到模型的维度

3.5. Positional Encoding

位置编码，因为注意力机制其实是没有考虑序列的位置信息的，它计算的是query和key的距离，所以在输入模型前需要加上位置编码来保证序列的每一步都包含其位置信息，pos表示第几个单词，2i表示对于偶数列，2i+1表示奇数列

\begin{equation} \begin{aligned} \text{PE}_{(pos,2i)} & = sin(pos/10000^{2i/d_{model}}) \\ \text{PE}_{(pos,2i+1)} & = cos(pos/10000^{2i/d_{model}}) \end{aligned} \end{equation}

Transformer的位置编码的做法是利用sin和cos对不同的时间步进行编码，也就是说Transformer的做法是不训练位置编码，作者的解释是相较于训练的位置编码，两者结果相似。

4. 数据集和实验

Transformer解决的是机器翻译任务，数据集为WMT 2014 English-German dataset，分词的时候采取BPE，并且英语和德语共用一个词库，这样保证了encoder和decoer的embedding层一致

Vision Transformer November 14, 2022 less than 1 minute read

1. ViT简介

ViT的全称是Vision Transformer，模型最大的特点就是把NLP领域的Transformer迁移到CV领域，模型在图像识别等多个CV任务上表现超越了卷积神经网络，相关链接:

2. Introduction

限制Transformer在CV领域发挥的一个要素就是序列长度的问题，如果把图像的每个像素点看成token，那么对于一张中分辨率的图片224*224而言，序列长度为50k，Bert模型处理的序列长度只有512，所以将Transformer应用到CV领域的一个难点就是序列长度过长。ViT于是提出了将图片分割成16*16的patch的做法，这样就可以大大减少序列的长度，不改变Transformer架构的情况下直接应用，取得了很好的结果，说明了transformer确实能在CV领域能取得很好的效果。

3. 模型

模型预览图:

为了缩短序列的长度，作者将图片分为了很多个patch，patch的大小可以是16*16，每个patch的表示为每个像素点的灰度值排列，那么一张图片patch的形状为num_patchs*768，其中768=16*16*3。得到patch的表示后，输入线性投影层(就是一个线性层，形状可以是768*768)。为了和transformer保持一致，ViT也采用了位置编码和特殊编码<cls>的使用，位置编码采用的是1D位置编码，位置编码和特殊编码都可学习，patch的线性投影表示和位置编码是直接加在一起，这样便得到了图像的token，其余步骤和Bert类似，ViT的训练是有监督训练，用特殊编码<cls>来做预测

4.实验和数据集

数据集使用了ImageNet-1k、ImageNet-21k和Google自家的JFT数据集

和BERT一样，ViT也根据模型使用的参数和patch的大小不同，分为了以下几种:

比如ViT-L/16代表Large ViT with patch size 16*16

实验结果

实验的metrics是预训练模型微调后的accuracy

这张图表明ViT在数据集大的情况下会表现更好

algorithm

算法练习心得 October 21, 2022 1 minute read

简介

这篇博客记录了做算法题过程的一些心得，目前的想法是先把网课的习题做完提交完，然后学习C语言面向对象的知识，然后去刷题平台上去刷题

Openjudge

题目链接
iostream头文件包含了cin与cout
int &a=b中&的用法是C++中的引用用法，变量的引用就是变量的别名，这样就可以实现在函数中向实参传递值
~的用法之一是按位取反运算，即数的每一位都取反，^的用法之一是按位异或运算
stdio.h是C语言的标准库，包含了C语言常用的输入输出函数，比如文件的读写函数fopen/fclose，格式化输入输出函数scanf/printf，为了适配C++，变成了cstdio
strlen()是string.h的一个函数，可以返回字符串的长度，原理是读到结束字符\0后停止
字符串变量可以通过cin读取键盘输入的字符串，对于由0/1组成的字符串而言，可以考虑用整型存储，然后通过按位运算对整型的每一个bit进行操作
memcpy()是cstring中的一个函数，使用时需要引用cstring头文件，memcpy(char * a, char * b, int c)表示将字符串b拷贝到字符串a，c表示字符串的大小
可以将递归嵌套在循环中完成穷举
全排列的实现有固定的套路: 对排列的每一位进行循环，如果有用过的元素，就标记一下，在下次选取的时候不考虑该元素，这样按着顺序选元素就可以实现全排列，具体见问题3，全排列也是一个穷举的过程
定义字符串或者一维数组时，最好还是把元素个数给大一点
在循环中可以通过bool值来控制某一个元素取还是不取，比如004中的加号可以由布尔值控制
cin.peek()可以读取当前输入的字符且不取走，cin.get()可以读取当前输入的字符并且取走
n = scanf("%c", &c)，如果当前输入停止，则n=EOF
while(cin >> a)可以实现对某种类型的变量a的持续输入
用二分法解决问题时，端点尽量最优化设计，并且除了取中点外，还可以通过左端点逐步加1/右端点逐步减1来寻找最优点
关于PI的数值，可以通过acos(-1.0)取值
#include<iomanip>引用的是iomanip库，主要用于控制输入输出流的精度，比如setprecision(int n)用于控制输出流浮点数的精度，整数n代表有效数字个数(四舍五入)，使用cout<<fixed<<setprecision(n)可以保留浮点数的n位有效小数
分治，或者说归并的思想，就是对每一步分而治之，确定每一小步的操作后，递归到归并的最小单元进行操作，常见的分治算法为归并排序与快速排序
利用#include<fstream>可以进行file的读写，步骤为: 实例化ifstream infile->打开文件infile.open("in.txt")->读文件infile.getline(...)->关闭文件infile.close()
#include <algorithm>中的方法max_element可以返回数组的最大值所在的地址，*max_element可以返回数组的最大值
scanf("%s", a)可以直接把字符串读入a
char *比int *占用内存少

artificial intelligence

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

So, what is a neural network? April 2, 2021 9 minute read

The omnipresence of technology nowadays has made it commonplace to read news about AI, just a quick glance at today’s headlines, and I get:

Topics from business, manufacturing, supply chain, medicine and biotech and even defense are covered in those news... read more

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

coding

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

Neural Network Optimization Methods and Algorithms March 12, 2021 8 minute read

For the seemingly small project I undertook of creating a machine learning neural network that could learn by itself to play tic-tac-toe, I bumped into the necesity of implementing at least one momentum algorithm for the optimization of the network during backpropagation.

And since my original post for the TicTacToe project is quite large already, I decided to post separately these optimization methods and how did I implement them in my code.

Adam

source

Adaptive Moment Estimation (Adam) is an optimization method that computes adaptive learning rates for each weight and bias. In addition to storing an... read more

Machine Learning Library in Python from scratch February 28, 2021 4 minute read

It must sound crazy that in this day and age, when we have such a myriad of amazing machine learning libraries and toolkits all open sourced, all quite well documented and easy to use, I decided to create my own ML library from scratch.

Let me try to explain; I am in the process of immersing myself into the world of Machine Learning, and to do so, I want to deeply understand the basic concepts and its foundations, and I think that there is no better way to do so than by creating myself all the... read more

copyright

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

course

C++面向对象程序设计 November 20, 2022 1 minute read

1. 简介

网课程序设计与算法(三):C++面向对象程序设计的随堂笔记，授课平台是中国大学MOOC，授课老师是北大的郭炜老师，开课次数是第十五次开课。程序设计与算法是一个系列的课程，C++面向对象程序设计是该系列的最后一门课程。课程文件有每节课的讲义及习题(openjudge上)，相关链接:

2. 随堂笔记

2.1. 第一章从C到C++

2.1.1. 引用

引用的概念: int & a = b

引用的一些注意事项

C语言中，形参的值改变不会影响实参的值
C语言中交换两个数的值的函数swap的写法:

C++中有了引入的概念后，swap函数便可以这么写:

除此之外，引用还可以作为函数的返回值，目前还不知道具体作用，后续会了解

常引用:

不能通过常引用来修改其引用内容:

const T & 和 T &的相互转换:

2.1.2. const关键字

const的第一个作用就是定义常量
const的第二个作用是定义常量指针:

courses

研一上课程总结 December 5, 2022 1 minute read

课程目录

课程目录
1. Material 材料
2. Deeplearning 深度学习
3. Intellectual property law 知识产权法
4. Culture and society 文化与社会
5. 新时代中国特色社会主义理论与实践
6. Organisation theory 组织理论
7. Solid State Physics 固体物理
8. Simulation optimization 仿真与优化
9. Automatic control 自动控制
10. Introduction to complex systems 复杂系统
11. Structural Mechanics 3 结构力学3
12. 英文科技论文写作与学术报告

1. Material 材料

课程信息:
- 授课老师: 唐宏哲
- 授课形式: 线下上课，课程有录屏
- 授课材料: 没有讲义，老师上课直接在黑板上写
- 考核信息: 大作业+考试，大作业是自由主题，但是要和上课的内容相关，考试在线下举行，时长一小时，大概四个大题
课程简介: 一共五节课，但一共只有四节课在讲正课的知识，第一节课讲了金属材料和高炉炼铁，炼钢和制铝，然后讲了高分子化合物。第二节课介绍了金属晶体，回想起了高中化学。第三节课讲了腐蚀与防护，我们大作业也是基于这个主题展开。第四节课讲了相平衡与相图，之前在热力学也学过类似的内容

2. Deeplearning 深度学习

课程信息:
- 授课老师:... read more

creativity

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

daily

重要时间点及事件 September 26, 2022 less than 1 minute read

练习Markdown November 24, 2021 1 minute read

1.标题

一级标题

二级标题

三级标题

一共有6级标题

2.段落及格式

用两个空格加回车表示换行
当然也可以直接空一行出来表示换行

1)各种文字表示

斜体

用两个或者两个_把需要斜体的文字围起来
比如：
*斜体
斜体

粗体

用两个或者两个__把需要粗体的文字围起来
比如:
**粗体
粗体

粗斜体

用两个或者两个___把需要粗体的文字围起来
比如:
**粗体*
粗体

2)分隔线

你可以在一行中用三个以上的星号、减号、底线来建立一个分隔线，行内不能有其他东西。你也可以在星号或是减号中间插入空格。下面每种写法都可以建立分隔线：
***

3)删除线

如果段落上的文字要添加删除线，只需要在文字的两端加上两个波浪线即可比如：
~~哈哈哈哈

4)下划线

下划线可以通过HTML的<u>标签来实现比如：
下划线

5)脚注

脚注是对文本的补充说明
创建脚注格式类似这样 ¹。
脚注链接与脚注不能紧挨在一起。
注脚默认在最后

3.列表

1)无序列表

无序列表使用星号(*)、加号(+)或是减号(-)作为列表标记，这些标记后面要添加一个空格，然后再填写内容。
比如：

第一项
第二项
第三项
第一项
第二项
... read more

空天报国,敢为人先 November 18, 2021 less than 1 minute read

deep Neural networks

Neural Network Optimization Methods and Algorithms March 12, 2021 8 minute read

For the seemingly small project I undertook of creating a machine learning neural network that could learn by itself to play tic-tac-toe, I bumped into the necesity of implementing at least one momentum algorithm for the optimization of the network during backpropagation.

And since my original post for the TicTacToe project is quite large already, I decided to post separately these optimization methods and how did I implement them in my code.

Adam

source

Adaptive Moment Estimation (Adam) is an optimization method that computes adaptive learning rates for each weight and bias. In addition to storing an... read more

finetune

DreamBooth November 28, 2022 less than 1 minute read

DreamBooth简介

DreamBooth是Google团队继Imagen后研发的针对Subject进行定制化训练的finetune方法，只需要同一个物体(动物、人、物体)的3-5张图片和prompt，就可以微调出一个专属的模型，这个模型可以生成输入物体的各种姿势，也可以将这个物体融入到景观中。DreamBooth本质上微调了Unet和TextEncoder，效果图如下:

相关链接:

DreamBooth

Application

Recontextualization:

Art Renditions

Expression Manipulation

Novel View Synthesis

Accessorization

Property Modification

general blogging

Starting the adventure March 24, 2021 10 minute read

In the midst of a global pandemic caused by the SARS-COV2 coronavirus; I decided to start blogging. I wanted to blog since a long time, I have always enjoyed writing, but many unknowns and having “no time” for it prevented me from taking it up. Things like: “I don’t really know who my target audience is”, “what would my topic or topics be?”, “I don’t think I am a world-class expert in anything”, and many more kept stopping me from setting up my own blog. Now seemed like a good time as any so with those and tons of other... read more

image generation

DreamBooth November 28, 2022 less than 1 minute read

DreamBooth简介

DreamBooth是Google团队继Imagen后研发的针对Subject进行定制化训练的finetune方法，只需要同一个物体(动物、人、物体)的3-5张图片和prompt，就可以微调出一个专属的模型，这个模型可以生成输入物体的各种姿势，也可以将这个物体融入到景观中。DreamBooth本质上微调了Unet和TextEncoder，效果图如下:

相关链接:

DreamBooth

Application

Recontextualization:

Art Renditions

Expression Manipulation

Novel View Synthesis

Accessorization

Property Modification

life

Starting the adventure March 24, 2021 10 minute read

In the midst of a global pandemic caused by the SARS-COV2 coronavirus; I decided to start blogging. I wanted to blog since a long time, I have always enjoyed writing, but many unknowns and having “no time” for it prevented me from taking it up. Things like: “I don’t really know who my target audience is”, “what would my topic or topics be?”, “I don’t think I am a world-class expert in anything”, and many more kept stopping me from setting up my own blog. Now seemed like a good time as any so with those and tons of other... read more

machine learning

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

So, what is a neural network? April 2, 2021 9 minute read

The omnipresence of technology nowadays has made it commonplace to read news about AI, just a quick glance at today’s headlines, and I get:

Topics from business, manufacturing, supply chain, medicine and biotech and even defense are covered in those news... read more

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

multi-modality

BLIP December 1, 2022 less than 1 minute read

Basic Info

论文全称: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
相关链接:
- Github

Introduction

Vision Language Pre-training近些年有不错的进展，但是以往的VLP的预训练任务要么是vision-language understanding，要么是vision-language generation。BLIP在设计预训练任务时，综合考虑了这两方面的预训练任务。BLIP另一大亮点就是对数据的处理，vision-language领域的数据集，人工标注且效果好的数据不多，所以作者提出了一种CapFilt的Bootstrapping方法来过滤掉不好的标注，扩充image-text pair数据。论文主要有两大贡献:

Multimodal mixture of Encoder-Decoder(MED)，一种新的预训练框架，综合考虑了understanding任务和generation任务
Captioning and Filtering(CapFilt)，一种数据集自举的方法，微调预训练的MED，并分为两个子模块，captioner用来将web获取的图片标注，filter用来将noisy的image text pair去除

Model Architecture

模型的主题架构如下:

作者设计了三种不同的子架构:

unimodal encoder: 上图左侧的两个架构，针对单一的text或者image的encoder，对于text而言就是BERT的encoder，对于image而言就是ViT的encoder，同样用<cls>特殊编码来表示sentence的全局信息
image-grounded text encoder: 上图中间的架构，针对text而言，不同之处是加上了image encoder后的cross attention，所以是image-grounded。text的开头加上特殊字符<encode>，代表image-text pair的多模态表示
image-grounded text decoder: 上图最右侧的架构，除了和image encoder的cross-attention外，还用causal self-attention替换了bidirectional self-attention，causal attention具体是什么，可以去看causal attention for vision-language tasks这篇论文。特殊字符<decode>用来表示序列的开始，除了self-attention层外，和image-grounded text encoder共享参数

MED这种架构的主要作用就是服务于预训练任务:

image-text contrastive loss(ITC): 应该和CLIP的任务类似，对齐特征空间中的image编码表示和text编码表示，属于vision-language understanding任务
image-text matching loss(ITM): 判断vision和language是否匹配，二分类任务，属于vision-language understanding任务
language modeling loss(LM): vision-language生成任务，自回归，给定序列的开头和图片的编码，输出完整的caption

CapFilt

正如前文所介绍，CapFilt的主要目的是筛除noisy的pair，因为从web上爬取的image-text pair质量太低。CaoFilt利用预训练的MED架构，抽取出两个子模块Captioner和Filter，Captioner就是MED的image-grounded text decoder部分，用来对web的图片进行标注，然后我们就获得了新的pair。Filter就是MED的image-grounded text... read more

neural networks

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

So, what is a neural network? April 2, 2021 9 minute read

The omnipresence of technology nowadays has made it commonplace to read news about AI, just a quick glance at today’s headlines, and I get:

Topics from business, manufacturing, supply chain, medicine and biotech and even defense are covered in those news... read more

Machine Learning Library in Python from scratch February 28, 2021 4 minute read

It must sound crazy that in this day and age, when we have such a myriad of amazing machine learning libraries and toolkits all open sourced, all quite well documented and easy to use, I decided to create my own ML library from scratch.

Let me try to explain; I am in the process of immersing myself into the world of Machine Learning, and to do so, I want to deeply understand the basic concepts and its foundations, and I think that there is no better way to do so than by creating myself all the... read more

note

GAN September 27, 2022 less than 1 minute read

1. 博客简介

GAN的全称是generative adversarial nets，是Goodfellow于2014年提出的新的生成模型框架，这种全新的生成模型框架有很多应用和变种，这篇博客主要介绍最开始的GAN的原理和论文整理，这里阅读的论文不是最终版(区别在于related work不同)，下面列出一些链接

2. GAN简介

GAN是一种全新的生成模型框架，它包含两个部分，生成模型G和辨别模型D，G的作用是捕捉数据的分布，D的作用是辨别数据来源于真实数据分布还是G生成的数据分布。生成模型训练过程就是让D犯错的可能性更高。GAN框架其实就是一个minmax game，如果G和D都是MLP的话，那么整个系统可以用逆传播机制训练。GAN的作者举了一个简单的例子介绍模型训练过程，生成模型可以看成印假钞的团伙，辨别模型可以看成警察，双方都在训练中提升自己的能力，最终希望达到的效果是警察无法分辨出一张假钞是真币还是假币。论文只介绍了一种特殊情况，就是G和D都是MLP的情况，作者把这种情况称为Adversarial nets

3. Adversarial nets

为了让生成模型学习到分布$p_g$(分布尽量和原始数据x的分布一致)，需要定义输入噪音的先验分布$p_z(z)$，$G(z;\theta_g)$表示噪音z输入生成模型的结果，G是一个可微分的函数，这里是MLP，参数为$\theta_g$。$D(x;\theta_d)$表示输入x后的辨别模型的结果，输出是一个标量，表示x来自于真实数据分布的概率。

也就是说，D和G的价值函数V(G, D)可表示为:

辨别模型D的目标是最大化价值函数的值，D(x)的取值在0-1之间，所以价值函数越大说明辨别模型D的效果越好，生成模型G的目标是最小化价值函数的值。GAN训练生成模型和辨别模型的过程为:

绿色的线是生成模型，蓝色虚线是辨别模型，黑色的散点线是原始数据分布

4. 理论原理

算法原理由下面这一张图片展示:

在每个迭代周期的每个批量中，我们有m个取自先验分布的噪音z，其中z$\sim$ $p_g(z)$和m个取自真实分布的x，其中x$\sim$ $p_{data}(x)$，先训练辨别器D，沿着梯度上升的方向更新参数，然后在沿着log(1-D(G($z^{(i)}$)))的梯度下降的方向更新参数。

接下来介绍了一些命题和证明和一些定理，证实了GAN用到的价值函数和目标函数的可行性

5. 其他

GAN在刚提出的时候还是有很多缺点的，比如模型还是比较难训练的，但是后续有很多很多的工作来优化原始的GAN模型，所以GAN更像是抛出了一个引子，让后续模型来优化它
GAN本质上就是左右手互博，目标函数设计的也很好

动手学深度学习 September 5, 2022 2 minute read

0. 简介
1. 预备知识
2. 线性神经网络
3. 多层感知机
5. 卷积神经网络
6. 现代卷积神经网络
7. 循环神经网络
8. 现代循环神经网络
9. 注意力机制
12. 计算机视觉
13. 自然语言处理: 预训练
14. 自然语言处理: 应用

0. 简介

《动手学深度学习》的笔记
各种链接:
- bilibili
- book_zh
- book_en
这本书笼统的介绍了深度学习所需要的各种知识，从线性神经网络开始讲起，然后到CNN，最后到RNN，介绍了CV和NLP领域的较新的网络结构。同时这本书不止有理论内容，每一小节都有代码实践内容，可以边写代码边了解知识，同时bilibili上有李沐老师的网课配合学习，很适合初学者进行学习。

1. 预备知识

这一章主要介绍了深度学习的一些前置知识，这里对比较重要的点做备注

张量(Tensor)包含了一维张量(向量)和二维张量(矩阵)
torch中A*B是哈达玛积，表示矩阵元素按元素相乘
torch.dot()是点积
torch.cat(…, dim=0)表示在行上延伸，比如(3, 4)和(3, 4)变成(6, 4)
A.sum(axis=0)表示把每一列的数据都相加，比如(5, 4)变成(4)
范数是norm，L1范数为每个元素的绝对值相加，L2范数为元素的平方和开根号，torch中默认L2范数，一般也是L2范数用的最多
梯度: 连接多元函数的所有偏导数:

梯度是一个向量
常用的梯度计算公式:

torch中自动求导的步骤:
- 第一步为x分配内存空间:... read more

Web速成 August 3, 2022 less than 1 minute read

1. HTML学习

2. CSS学习

3. JS学习

notes

BLIP December 1, 2022 less than 1 minute read

Basic Info

论文全称: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
相关链接:
- Github

Introduction

Vision Language Pre-training近些年有不错的进展，但是以往的VLP的预训练任务要么是vision-language understanding，要么是vision-language generation。BLIP在设计预训练任务时，综合考虑了这两方面的预训练任务。BLIP另一大亮点就是对数据的处理，vision-language领域的数据集，人工标注且效果好的数据不多，所以作者提出了一种CapFilt的Bootstrapping方法来过滤掉不好的标注，扩充image-text pair数据。论文主要有两大贡献:

Multimodal mixture of Encoder-Decoder(MED)，一种新的预训练框架，综合考虑了understanding任务和generation任务
Captioning and Filtering(CapFilt)，一种数据集自举的方法，微调预训练的MED，并分为两个子模块，captioner用来将web获取的图片标注，filter用来将noisy的image text pair去除

Model Architecture

模型的主题架构如下:

作者设计了三种不同的子架构:

unimodal encoder: 上图左侧的两个架构，针对单一的text或者image的encoder，对于text而言就是BERT的encoder，对于image而言就是ViT的encoder，同样用<cls>特殊编码来表示sentence的全局信息
image-grounded text encoder: 上图中间的架构，针对text而言，不同之处是加上了image encoder后的cross attention，所以是image-grounded。text的开头加上特殊字符<encode>，代表image-text pair的多模态表示
image-grounded text decoder: 上图最右侧的架构，除了和image encoder的cross-attention外，还用causal self-attention替换了bidirectional self-attention，causal attention具体是什么，可以去看causal attention for vision-language tasks这篇论文。特殊字符<decode>用来表示序列的开始，除了self-attention层外，和image-grounded text encoder共享参数

MED这种架构的主要作用就是服务于预训练任务:

image-text contrastive loss(ITC): 应该和CLIP的任务类似，对齐特征空间中的image编码表示和text编码表示，属于vision-language understanding任务
image-text matching loss(ITM): 判断vision和language是否匹配，二分类任务，属于vision-language understanding任务
language modeling loss(LM): vision-language生成任务，自回归，给定序列的开头和图片的编码，输出完整的caption

CapFilt

正如前文所介绍，CapFilt的主要目的是筛除noisy的pair，因为从web上爬取的image-text pair质量太低。CaoFilt利用预训练的MED架构，抽取出两个子模块Captioner和Filter，Captioner就是MED的image-grounded text decoder部分，用来对web的图片进行标注，然后我们就获得了新的pair。Filter就是MED的image-grounded text... read more

DreamBooth November 28, 2022 less than 1 minute read

DreamBooth简介

DreamBooth是Google团队继Imagen后研发的针对Subject进行定制化训练的finetune方法，只需要同一个物体(动物、人、物体)的3-5张图片和prompt，就可以微调出一个专属的模型，这个模型可以生成输入物体的各种姿势，也可以将这个物体融入到景观中。DreamBooth本质上微调了Unet和TextEncoder，效果图如下:

相关链接:

DreamBooth

Application

Recontextualization:

Art Renditions

Expression Manipulation

Novel View Synthesis

Accessorization

Property Modification

C++面向对象程序设计 November 20, 2022 1 minute read

1. 简介

网课程序设计与算法(三):C++面向对象程序设计的随堂笔记，授课平台是中国大学MOOC，授课老师是北大的郭炜老师，开课次数是第十五次开课。程序设计与算法是一个系列的课程，C++面向对象程序设计是该系列的最后一门课程。课程文件有每节课的讲义及习题(openjudge上)，相关链接:

2. 随堂笔记

2.1. 第一章从C到C++

2.1.1. 引用

引用的概念: int & a = b

引用的一些注意事项

C语言中，形参的值改变不会影响实参的值
C语言中交换两个数的值的函数swap的写法:

C++中有了引入的概念后，swap函数便可以这么写:

除此之外，引用还可以作为函数的返回值，目前还不知道具体作用，后续会了解

常引用:

不能通过常引用来修改其引用内容:

const T & 和 T &的相互转换:

2.1.2. const关键字

const的第一个作用就是定义常量
const的第二个作用是定义常量指针:

optimization

Neural Network Optimization Methods and Algorithms March 12, 2021 8 minute read

For the seemingly small project I undertook of creating a machine learning neural network that could learn by itself to play tic-tac-toe, I bumped into the necesity of implementing at least one momentum algorithm for the optimization of the network during backpropagation.

And since my original post for the TicTacToe project is quite large already, I decided to post separately these optimization methods and how did I implement them in my code.

Adam

source

Adaptive Moment Estimation (Adam) is an optimization method that computes adaptive learning rates for each weight and bias. In addition to storing an... read more

overview

Vision Language Model November 14, 2022 less than 1 minute read

VLM

VLM，即vision language model，旨在用语言模型获得视觉信息。lilian将VLM分为了四种，分别是:

利用嵌入层获得图片特征，然后与词元特征聚合后一起训练，代表性的模型有VisualBERT、SimVLM和CM3
将训练好的图片嵌入层直接用于模型，这些图片嵌入层是frozen的，即整体模型在训练时不改变图片嵌入层的权重，代表性的模型有CLIP
利用注意力机制将视觉信息融入到语言模型中，代表模型有VisualGPT，VC-GPT，MERLOT，Flamingo，Coca
直接combine视觉和语言模型，不加以训练，代表性模型有MAGiC，PICa，Socratic Models

任务

VLM能实现的任务可以分为三类:

生成任务:
- Visual QA: 给一张图片和一个问题，模型根据图片信息返回答案
- Visual Captioning: 给定图片，生成字幕
- Visual Commonsense Reasoning: 给定图片，推断出图片的common-sense information
- Visual Generation: 给定文本输入，生成图片
分类任务:
- Multimodal Affective Computing: 多模态版本的情感分析
- Natural Language for Visual Reasoning: 给定一张图片和一段陈述，判断陈述是否正确
找回任务(retrieval task):
- Visual Retrieval: 通过文本描述恢复图片
- Vision-Language Navigation: 通过自然语言的指令来对agent进行导航
- Multimodal Machine Translation: 将一种语言翻译成另一种语言，附带视觉信息

BERT-like架构

鉴于BERT在NLP领域的兴起，不同模态领域里也出现了BERT-like的架构，代表性的模型有VisualBERT，ViLBERT，PixelBERT等

contrastive learning

自从CLIP出现后，大家发现用对比学习的方法能很好地连接起vision和language的信息，类似的模型有ALIGN和FLORENCE

VLM 论文

paperswithcode

VLM 最新的论文

paper

短文本自动评估论文阅读整理 October 10, 2022 16 minute read

1. 论文简介
2. Automatic Short-Answer Grading via BERT-Based Deep Neural Networks

T5模型 September 19, 2022 1 minute read

1. T5简介
2. 读论文

RACE数据集相关文献 November 30, 2021 3 minute read

paper reading

BLIP December 1, 2022 less than 1 minute read

Basic Info

论文全称: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
相关链接:
- Github

Introduction

Vision Language Pre-training近些年有不错的进展，但是以往的VLP的预训练任务要么是vision-language understanding，要么是vision-language generation。BLIP在设计预训练任务时，综合考虑了这两方面的预训练任务。BLIP另一大亮点就是对数据的处理，vision-language领域的数据集，人工标注且效果好的数据不多，所以作者提出了一种CapFilt的Bootstrapping方法来过滤掉不好的标注，扩充image-text pair数据。论文主要有两大贡献:

Multimodal mixture of Encoder-Decoder(MED)，一种新的预训练框架，综合考虑了understanding任务和generation任务
Captioning and Filtering(CapFilt)，一种数据集自举的方法，微调预训练的MED，并分为两个子模块，captioner用来将web获取的图片标注，filter用来将noisy的image text pair去除

Model Architecture

模型的主题架构如下:

作者设计了三种不同的子架构:

unimodal encoder: 上图左侧的两个架构，针对单一的text或者image的encoder，对于text而言就是BERT的encoder，对于image而言就是ViT的encoder，同样用<cls>特殊编码来表示sentence的全局信息
image-grounded text encoder: 上图中间的架构，针对text而言，不同之处是加上了image encoder后的cross attention，所以是image-grounded。text的开头加上特殊字符<encode>，代表image-text pair的多模态表示
image-grounded text decoder: 上图最右侧的架构，除了和image encoder的cross-attention外，还用causal self-attention替换了bidirectional self-attention，causal attention具体是什么，可以去看causal attention for vision-language tasks这篇论文。特殊字符<decode>用来表示序列的开始，除了self-attention层外，和image-grounded text encoder共享参数

MED这种架构的主要作用就是服务于预训练任务:

image-text contrastive loss(ITC): 应该和CLIP的任务类似，对齐特征空间中的image编码表示和text编码表示，属于vision-language understanding任务
image-text matching loss(ITM): 判断vision和language是否匹配，二分类任务，属于vision-language understanding任务
language modeling loss(LM): vision-language生成任务，自回归，给定序列的开头和图片的编码，输出完整的caption

CapFilt

正如前文所介绍，CapFilt的主要目的是筛除noisy的pair，因为从web上爬取的image-text pair质量太低。CaoFilt利用预训练的MED架构，抽取出两个子模块Captioner和Filter，Captioner就是MED的image-grounded text decoder部分，用来对web的图片进行标注，然后我们就获得了新的pair。Filter就是MED的image-grounded text... read more

DreamBooth November 28, 2022 less than 1 minute read

DreamBooth简介

DreamBooth是Google团队继Imagen后研发的针对Subject进行定制化训练的finetune方法，只需要同一个物体(动物、人、物体)的3-5张图片和prompt，就可以微调出一个专属的模型，这个模型可以生成输入物体的各种姿势，也可以将这个物体融入到景观中。DreamBooth本质上微调了Unet和TextEncoder，效果图如下:

相关链接:

DreamBooth

Application

Recontextualization:

Art Renditions

Expression Manipulation

Novel View Synthesis

Accessorization

Property Modification

python

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

Machine Learning Library in Python from scratch February 28, 2021 4 minute read

It must sound crazy that in this day and age, when we have such a myriad of amazing machine learning libraries and toolkits all open sourced, all quite well documented and easy to use, I decided to create my own ML library from scratch.

Let me try to explain; I am in the process of immersing myself into the world of Machine Learning, and to do so, I want to deeply understand the basic concepts and its foundations, and I think that there is no better way to do so than by creating myself all the... read more

Conway's Game of Life February 10, 2021 3 minute read

I am lately trying to take on coding again. It had always been a part of my life since my early years when I learned to program a Tandy Color Computer at the age of 8, the good old days.

Tandy Color Computer TRS80 III

Having already programed in Java, C# and of course BASIC, I thought it would be a great idea to learn Python since I have great interest in data science and machine learning, and those two topics seem to have an avid community within Python coders.

For one of my starter quick programming... read more

record

毕设记录 April 20, 2022 1 minute read

1. 数据预处理

1.1. 使用MySQL对原始数据进行处理

目标: 生成每行为一个学生，第一列为学号，第二列到最后一列都是课程名称
第一步: 创建表格
1. 首先遇到的问题是创建列名时有MySQL关键字，所以对KCMC两端加上了反引号
2. 使用Group_concat时有内容长度限制，需要使用以下代码来暂时增大限制:
```
 SET GLOBAL group_concat_max_len = 4294967295; SET SESSION group_concat_max_len = 4294967295; 
```
3. 列名长度硬性要求: 不能超过64个字符，所以我采用了将英文翻译为中文的方法减少长度，有以下几门学科名称做过修改:
  - UPDATE grade_original SET KCMC = ‘网格生成方法及软件简介’ WHERE KCMC=’An Introduction to Mesh Generation Methods & Software for Scientific Computing’
  - UPDATE grade_original SET KCMC = ‘经典论文鉴赏:电磁学顶级论文精选’ WHERE KCMC=’Appreciation of Classical Papers: The Selected Top Papers in Electromagnetism’
  - UPDATE grade_original SET KCMC = ‘动脉硬化的脆弱性评估:从体内成像到生物力学’ WHERE KCMC=’Atherosclerosis Vulnerability Assessment: From In Vivo Imaging To Biomechanics’
  - UPDATE grade_original SET KCMC = ‘计算机建模和仿真基础:方法、技术和应用’ WHERE KCMC=’Basics of Computer-Based Modelling and Simulation: Methodologies, Technologies and Applications’
  - UPDATE... read more

reinforcement learning

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

school

课程总结 April 29, 2022 less than 1 minute read

Introduction to corporate finance
- 课程简介
- 课程内容
工业科学
- 课程简介
- 课程内容
量子力学
- 课程简介
- 课程内容
统计物理
- 课程简介
- 课程内容
信号处理
- 课程简介
- 课程内容
结构力学2
- 课程简介
- 课程内容
传热学(Heat Transfert)
- 课程简介
- 课程内容
过程工程(Process Engin)
- 课程简介
- 课程内容
Press
- 课程简介
- 课程内容
Audiovisuel
- 课程简介
- 课程内容
read more

软件方法 November 30, 2021 5 minute read

软件方法

课程要求

学习面向对象这种软件开发方法（目前概念越来越广），通过java来了解面向对象的编程具体怎么实现。

随记

类，对象：
- 对象是类的一个实例
- c语言可以构建面向对象所有的结构
- 类集合了属性和方法
面向对象的三大特征：
- 封装（encapsulation）:
  - private, protected, public
  - 可作用于属性和方法，一般构造方法和成员方法都是public, 属性都是private
  - 一般是隐藏对象的属性和实现细节，但是提供方法的接口
  - 提供公开的方法
  - 提高了软件开发的效率
- 继承（inheritance）：
  - 子类与父类
  - 子类自动具有父类属性和方法，添加自己特有的属性和方法，并且子类使用父类的方法也可以覆盖/重写父类方法
  - 可以实现代码的复用（当然功能不止于此）
- 多态（polymorphism）：
  - 父类有多个子类
  - 子类覆盖/重写父类方法
  - 相当于是根据实际创建的对象类型动态决定使用哪个方法
  - 所有的子类都可以看成父类的类型，运行时，系统会自动调用各种子类的方法
  - UML可以画出类之间的关系
java程序设计
- 百分百面向对象
  - 不存在类以外代码
  - 只能采用面向对象方法编程
  - java文件命名规范... read more

课程总结 November 28, 2021 2 minute read

概率统计
- 简介
- 内容总览
流体力学
- 简介
- 内容总览
- A4纸
- 报告
电磁辐射波
- 简介
- 内容总览
传感器
- 简介
- 内容总览
结构力学
项目管理
- 简介
- 内容总览
工程热力学
Presse
- 简介
- 内容总览
Audiovisual
- 简介
- 内容总览
软件方法
- 简介
- 内容总览

thoughts

Starting the adventure March 24, 2021 10 minute read

In the midst of a global pandemic caused by the SARS-COV2 coronavirus; I decided to start blogging. I wanted to blog since a long time, I have always enjoyed writing, but many unknowns and having “no time” for it prevented me from taking it up. Things like: “I don’t really know who my target audience is”, “what would my topic or topics be?”, “I don’t think I am a world-class expert in anything”, and many more kept stopping me from setting up my own blog. Now seemed like a good time as any so with those and tons of other... read more

vision-language

BLIP December 1, 2022 less than 1 minute read

Basic Info

论文全称: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
相关链接:
- Github

Introduction

Vision Language Pre-training近些年有不错的进展，但是以往的VLP的预训练任务要么是vision-language understanding，要么是vision-language generation。BLIP在设计预训练任务时，综合考虑了这两方面的预训练任务。BLIP另一大亮点就是对数据的处理，vision-language领域的数据集，人工标注且效果好的数据不多，所以作者提出了一种CapFilt的Bootstrapping方法来过滤掉不好的标注，扩充image-text pair数据。论文主要有两大贡献:

Multimodal mixture of Encoder-Decoder(MED)，一种新的预训练框架，综合考虑了understanding任务和generation任务
Captioning and Filtering(CapFilt)，一种数据集自举的方法，微调预训练的MED，并分为两个子模块，captioner用来将web获取的图片标注，filter用来将noisy的image text pair去除

Model Architecture

模型的主题架构如下:

作者设计了三种不同的子架构:

unimodal encoder: 上图左侧的两个架构，针对单一的text或者image的encoder，对于text而言就是BERT的encoder，对于image而言就是ViT的encoder，同样用<cls>特殊编码来表示sentence的全局信息
image-grounded text encoder: 上图中间的架构，针对text而言，不同之处是加上了image encoder后的cross attention，所以是image-grounded。text的开头加上特殊字符<encode>，代表image-text pair的多模态表示
image-grounded text decoder: 上图最右侧的架构，除了和image encoder的cross-attention外，还用causal self-attention替换了bidirectional self-attention，causal attention具体是什么，可以去看causal attention for vision-language tasks这篇论文。特殊字符<decode>用来表示序列的开始，除了self-attention层外，和image-grounded text encoder共享参数

MED这种架构的主要作用就是服务于预训练任务:

image-text contrastive loss(ITC): 应该和CLIP的任务类似，对齐特征空间中的image编码表示和text编码表示，属于vision-language understanding任务
image-text matching loss(ITM): 判断vision和language是否匹配，二分类任务，属于vision-language understanding任务
language modeling loss(LM): vision-language生成任务，自回归，给定序列的开头和图片的编码，输出完整的caption

CapFilt

正如前文所介绍，CapFilt的主要目的是筛除noisy的pair，因为从web上爬取的image-text pair质量太低。CaoFilt利用预训练的MED架构，抽取出两个子模块Captioner和Filter，Captioner就是MED的image-grounded text decoder部分，用来对web的图片进行标注，然后我们就获得了新的pair。Filter就是MED的image-grounded text... read more

work

制作类RACE数据集 December 21, 2021 1 minute read

RACE

简介

RACE数据集包含了中国初高中阅读理解题目，最初发布在2017年，一共含有28k短文和100k个问题，最开始发布的目的是为了阅读理解任务。它的特点是包含了很多需要推理的问题。

原RACE数据集地址
下载地址url
论文地址：RACE: Large-scale ReAding Comprehension Dataset From Examinations

RACE数据集格式

Each passage is a JSON file.... read more

Bert

1. Bert简介

2. 相关工作

3. 模型架构

3.1. 输入表示

3.2. 预训练的Task

4. 预训练数据集和实验

NLP

1. Bert简介

2. 相关工作

3. 模型架构

3.1. 输入表示

3.2. 预训练的Task

4. 预训练数据集和实验

Paper Reading

1. Bert简介

2. 相关工作

3. 模型架构

3.1. 输入表示

3.2. 预训练的Task

4. 预训练数据集和实验

1. Transformer简介

2. Introduction

3. 模型架构

3.1. 注意力

3.2. 多头注意力

3.3. Position-wise Feed-Forward Networks

3.4. Embedding and softmax

3.5. Positional Encoding

4. 数据集和实验

1. ViT简介

2. Introduction

3. 模型

4.实验和数据集

Transformer

1. Transformer简介

2. Introduction

3. 模型架构

3.1. 注意力

3.2. 多头注意力

3.3. Position-wise Feed-Forward Networks

3.4. Embedding and softmax

3.5. Positional Encoding

4. 数据集和实验

1. ViT简介

2. Introduction

3. 模型

4.实验和数据集

algorithm

简介

Openjudge

artificial intelligence

Background

coding

Background

Adam

copyright

course

1. 简介

2. 随堂笔记

2.1. 第一章 从C到C++

2.1.1. 引用

2.1.2. const关键字

courses

课程目录

1. Material 材料

2. Deeplearning 深度学习

creativity

daily

1.标题

一级标题

二级标题

三级标题

2.段落及格式

1)各种文字表示

斜体

粗体

粗斜体

2)分隔线

3)删除线

2.1. 第一章从C到C++

2.1. 第一章从C到C++