Porn Data Anaylize — 上传者 分类信息分析(github)

'''
视频作者 视频分类信息分析
http://www.h4ck.org.cn
by obaby
obaby@mars
email:root@obaby.org.cn
date: 2020.09.04
'''
from pyspark.sql.functions import col
import altair as alt
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
csv = spark.read.option("header",True).csv("hdfs://localhost:9000/data2/porn_data_movie.csv")
csv.printSchema()
root
 |-- id: string (nullable = true)
 |-- create: string (nullable = true)
 |-- update: string (nullable = true)
 |-- name: string (nullable = true)
 |-- describe: string (nullable = true)
 |-- source_id: string (nullable = true)
 |-- publish_time: string (nullable = true)
 |-- play_count: string (nullable = true)
 |-- good_count: string (nullable = true)
 |-- bad_count: string (nullable = true)
 |-- link_count: string (nullable = true)
 |-- comment_count: string (nullable = true)
 |-- designation: string (nullable = true)
 |-- category_id: string (nullable = true)
 |-- porn_site_id: string (nullable = true)
 |-- uploader_id: string (nullable = true)
 |-- producer: string (nullable = true)
csv.select('name', 'describe', 'uploader_id').show()
Continue Reading

jupyter notebook 调整字体 以及matplotlib显示中文

原生的jupyter theme看起来比较蛋疼,尤其是字体和字号。为了修改这个配置可以安装 jupyter theme。

项目链接: https://github.com/dunovank/jupyter-themes 如果不喜欢英文可以参考这个链接:https://www.jianshu.com/p/6de5f6cce06d

上面的样式对应的配置命令:
jt  -f fira -fs 11 -cellw 90% -ofs 11 -dfs 11 -T -t solarizedl

除此之外matplotlib 默认不支持中文显示,主要是字体问题,可以通过下面的代码配置来让matplotlib 支持中文

from matplotlib import pyplot as plt
%matplotlib inline
font = {'family' : 'MicroSoft YaHei',
'weight' : 'bold',
'size' : 10}
plt.rc("font", **font)

实际效果,另外还可以使用altair ,altair 默认支持中文显示 https://altair-viz.github.io