一级毛片久久久久久女人十八,国产吃瓜黑料一区二区,嗯～用力啊～嗯～c,黄色动漫最新动漫免费观看,国产精品久久久久久久久久ktv,国产高清一二三区,老师的玉臀吞吐着巨龙,日本老熟妇


↑?關(guān)注 + 星標(biāo)?，每天學(xué)Python新技能
后臺(tái)回復(fù)【大禮包】送你Python自學(xué)大禮包

前言

大家好！今天給大家?guī)?lái)京東數(shù)據(jù)的簡(jiǎn)單采集和可視化分析，希望大家可以喜歡。本文來(lái)自古月星辰，大三本科生，數(shù)學(xué)專業(yè)，Python爬蟲愛好者一枚。

一、目標(biāo)數(shù)據(jù)

隨著移動(dòng)支付的普及，電商網(wǎng)站不斷涌現(xiàn)，由于電商網(wǎng)站產(chǎn)品太多，由用戶產(chǎn)生的評(píng)論數(shù)據(jù)就更多了，這次我們以京東為例，針對(duì)某一單品的評(píng)論數(shù)據(jù)進(jìn)行數(shù)據(jù)采集,并且做簡(jiǎn)單數(shù)據(jù)分析。

二、頁(yè)面分析

這個(gè)是某一手機(jī)頁(yè)面的詳情頁(yè)，對(duì)應(yīng)著手機(jī)的各種參數(shù)以及用戶評(píng)論信息，頁(yè)面URL是：

https://item.jd.com/10022971060622.html#none

然后通過(guò)分析找到評(píng)論數(shù)據(jù)對(duì)應(yīng)的數(shù)據(jù)接口，如下圖所示:

它的請(qǐng)求url:

https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98& productId=10022971060622 &score=0&sortType=5& page=0 &pageSize=10&isShadowSku=0&fold=1

注意看到這兩個(gè)關(guān)鍵參數(shù) ????1. productId: 每個(gè)商品有一個(gè)id ????2. page: 對(duì)應(yīng)的評(píng)論分頁(yè)

三、解析數(shù)據(jù)

對(duì)評(píng)論數(shù)據(jù)的url發(fā)起請(qǐng)求:

url:https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98& productId=10022971060622 &score=0&sortType=5& page=0 &pageSize=10&isShadowSku=0&fold=1

json.cn 打開json數(shù)據(jù)(我們的評(píng)論數(shù)據(jù)是以json形式與頁(yè)面進(jìn)行交互傳輸?shù)?，如下圖所示:

分析可知,評(píng)論url中對(duì)應(yīng)十條評(píng)論數(shù)據(jù),對(duì)于每一條評(píng)論數(shù)據(jù),我們需要獲取3條數(shù) 據(jù),contents,color,size(注意到上圖的maxsize,100,也就是100*10=1000條評(píng)論)。

四、程序

1.導(dǎo)入相關(guān)庫(kù)

import  requestsimport  jsonimport  timeimport  openpyxl  #第三方模塊，用于操作Excel文件的#模擬瀏覽器發(fā)送請(qǐng)求并獲取響應(yīng)結(jié)果import random

2.獲取評(píng)論數(shù)據(jù)

def get_comments(productId,page):    url='https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId={0}&score=0&sortType=5&page={1}&pageSize=10&isShadowSku=0&fold=1'.format(productId,page) # 商品id    resp=requests.get(url,headers=headers)    #print(resp.text)  #響應(yīng)結(jié)果進(jìn)行顯示輸出    s1=resp.text.replace('fetchJSON_comment98(','') #fetchJSON_comment98(    s=s1.replace(');','')    #將str類型的數(shù)據(jù)轉(zhuǎn)成json格式的數(shù)據(jù)    # print(s,type(s))    # print('*'*100)    res=json.loads(s)    print(type(res))    return res

3.獲取最大頁(yè)數(shù)(也可以不寫)

def get_max_page(productId):    dic_data=get_comments(productId,0)  #調(diào)用剛才寫的函數(shù)，向服務(wù)器發(fā)送請(qǐng)求，獲取字典數(shù)據(jù)    return dic_data['maxPage']

4.提取數(shù)據(jù)

def get_info(productId):    #調(diào)用函數(shù)獲取商品的最大評(píng)論頁(yè)數(shù)    #max_page=get_max_page(productId)    # max_page=10    lst=[]  #用于存儲(chǔ)提取到的商品數(shù)據(jù)    for page in range(0,get_max_page(productId)):   #循環(huán)執(zhí)行次數(shù)        #獲取每頁(yè)的商品評(píng)論        comments=get_comments(productId,page)        comm_lst=comments['comments']   #根據(jù)key獲取value，根據(jù)comments獲取到評(píng)論的列表（每頁(yè)有10條評(píng)論）        #遍歷評(píng)論列表，分別獲取每條評(píng)論的中的內(nèi)容，顏色，鞋碼        for item in comm_lst:   #每條評(píng)論又分別是一個(gè)字典，再繼續(xù)根據(jù)key獲取值            content=item['content']  #獲取評(píng)論中的內(nèi)容            color=item['productColor'] #獲取評(píng)論中的顏色            size=item['productSize'] #鞋碼            lst.append([content,color,size])  #將每條評(píng)論的信息添加到列表中        time.sleep(3)  #延遲時(shí)間，防止程序執(zhí)行速度太快，被封IP????save(lst)??#調(diào)用自己編寫的函數(shù)，將列表中的數(shù)據(jù)進(jìn)行存儲(chǔ)

5.用于將爬取到的數(shù)據(jù)存儲(chǔ)到Excel中

def save(lst):    wk=openpyxl.Workbook () #創(chuàng)建工作薄對(duì)象    sheet=wk.active  #獲取活動(dòng)表    #遍歷列表，將列表中的數(shù)據(jù)添加到工作表中,列表中的一條數(shù)據(jù)，在Excel中是 一行    for item in lst:        sheet.append(item)    #保存到磁盤上    wk.save('銷售數(shù)據(jù).xlsx')

6.運(yùn)行程序

if __name__ == '__main__':    productId='10029693009906' # 單品id    get_info(productId)

五、簡(jiǎn)單數(shù)據(jù)

1.簡(jiǎn)單配置

# 導(dǎo)入相關(guān)庫(kù)import pandas as pd import matplotlib.pyplot as plt# 這兩行代碼解決 plt 中文顯示的問(wèn)題plt.rcParams['font.sans-serif'] = ['SimHei']plt.rcParams['axes.unicode_minus'] = False# 由于采集的時(shí)候沒有設(shè)置表頭,此處設(shè)置表頭data = pd.read_excel('./銷售數(shù)據(jù).xlsx', header=None, names = ['comments','color','intro'] ) # data.head()

2.手機(jī)顏色數(shù)量對(duì)比

x = ['白色','黑色','綠色','藍(lán)色','紅色','紫色']y = [314,295,181,173,27,10]plt.bar(x,y)plt.title('各種顏色手機(jī)數(shù)量對(duì)比')plt.xlabel('顏色')plt.ylabel('數(shù)量')# plt.legend() # 顯示圖例plt.show()

可以看出用戶購(gòu)買的手機(jī)白色和黑色的機(jī)型比較多.占據(jù)了60%多。3.評(píng)論詞云展示1）先要提取評(píng)論數(shù)據(jù)

import xlrddef strs(row):    values = "";    for i in range(len(row)):        if i == len(row) - 1:            values = values + str(row[i])        else:            values = values + str(row[i])    return values# 打卡文件data = xlrd.open_workbook("./銷售數(shù)據(jù).xlsx")sqlfile = open("data.txt", "a")  # 文件讀寫方式是追加table = data.sheets()[0]  # 表頭nrows = table.nrows  # 行數(shù)ncols = table.ncols  # 列數(shù)colnames = table.row_values(1)  # 某一行數(shù)據(jù)# 打印出行數(shù)列數(shù)for ronum in range(1, nrows):        row = table.cell_value(rowx=ronum, colx = 0) #只需要修改你要讀取的列數(shù)-1        values = strs(row)  # 調(diào)用函數(shù)，將行數(shù)據(jù)拼接成字符串        sqlfile.writelines(values + "\n")  # 將字符串寫入新文件sqlfile.close()  # 關(guān)閉寫入的文件

2）詞云展示?

# 導(dǎo)入相應(yīng)的庫(kù)import jiebafrom PIL import Imageimport numpy as npfrom wordcloud import WordCloudimport matplotlib.pyplot as plt# 導(dǎo)入文本數(shù)據(jù)并進(jìn)行簡(jiǎn)單的文本處理# 去掉換行符和空格text = open("./data.txt",encoding='gbk').read()text = text.replace('\n',"").replace("\u3000","")
# 分詞，返回結(jié)果為詞的列表text_cut = jieba.lcut(text)# 將分好的詞用某個(gè)符號(hào)分割開連成字符串text_cut = ' '.join(text_cut)

注意: 這里我們不能使用encoding='uth-8'，會(huì)報(bào)出一個(gè)錯(cuò)誤:

> 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte

所以我們需要改成 gbk。

word_list = jieba.cut(text)space_word_list = ' '.join(word_list)print(space_word_list)# 調(diào)用包PIL中的open方法，讀取圖片文件，通過(guò)numpy中的array方法生成數(shù)組mask_pic = np.array(Image.open("./xin.png"))word = WordCloud(    font_path='C:/Windows/Fonts/simfang.ttf',  # 設(shè)置字體，本機(jī)的字體    mask=mask_pic,  # 設(shè)置背景圖片    background_color='white',  # 設(shè)置背景顏色    max_font_size=150,  # 設(shè)置字體最大值    max_words=2000,  # 設(shè)置最大顯示字?jǐn)?shù)    stopwords={'的'}  # 設(shè)置停用詞，停用詞則不在詞云途中表示                 ).generate(space_word_list)image = word.to_image()word.to_file('2.png')  # 保存圖片image.show()

最后得到的效果圖，如下圖所示：

推薦閱讀
爆強(qiáng)！將 exe 文件反編譯成 Python 腳本！
國(guó)產(chǎn)Linux發(fā)行版再添一員，操作界面不輸蘋果
Python一行代碼能做什么，30個(gè)實(shí)用案例代碼詳解

實(shí)戰(zhàn)：用Python采集京東銷售數(shù)據(jù)并做簡(jiǎn)單的數(shù)據(jù)分析和可視化

↑?關(guān)注 + 星標(biāo)?，每天學(xué)Python新技能

后臺(tái)回復(fù)【大禮包】送你Python自學(xué)大禮包

大家好！今天給大家?guī)?lái)京東數(shù)據(jù)的簡(jiǎn)單采集和可視化分析，希望大家可以喜歡。本文來(lái)自古月星辰，大三本科生，數(shù)學(xué)專業(yè)，Python爬蟲愛好者一枚。

一、目標(biāo)數(shù)據(jù)

二、頁(yè)面分析

三、解析數(shù)據(jù)

四、程序

五、簡(jiǎn)單數(shù)據(jù)

推薦閱讀

爆強(qiáng)！將 exe 文件反編譯成 Python 腳本！國(guó)產(chǎn)Linux發(fā)行版再添一員，操作界面不輸蘋果Python一行代碼能做什么，30個(gè)實(shí)用案例代碼詳解

↑?關(guān)注 + 星標(biāo)?，每天學(xué)Python新技能

大家好！今天給大家?guī)?lái)京東數(shù)據(jù)的簡(jiǎn)單采集和可視化分析，希望大家可以喜歡。本文來(lái)自古月星辰，大三本科生，數(shù)學(xué)專業(yè)，Python爬蟲愛好者一枚。

一、目標(biāo)數(shù)據(jù)

三、解析數(shù)據(jù)

四、程序

五、簡(jiǎn)單數(shù)據(jù)

爆強(qiáng)！將 exe 文件反編譯成 Python 腳本！
國(guó)產(chǎn)Linux發(fā)行版再添一員，操作界面不輸蘋果
Python一行代碼能做什么，30個(gè)實(shí)用案例代碼詳解