编辑整理:整理来源:360问答,浏览量:76,时间:2022-07-31 20:00:01
怎么分析八爪鱼采集的数据,怎么分析八爪鱼采集的数据是否正确,八爪鱼数据采集器
NEW
分享兴趣,传播快乐,增长见闻,留下美好!亲爱的您,这里是LearningYard新学苑。今天小编为大家带来文章:经验分享——使用八爪鱼采集器循环列表进入详情页采集。
功能介绍
八爪鱼采集器是一款全网通用的互联网数据采集器,模拟人浏览网页的行为,通过简单的页面点选,生成自动化的采集流程,从而将网页数据转化为结构化数据,存储于EXCEL或数据库等多种形式。并提供基于云计算的大数据云采集解决方案,实现数据采集。是数据一键采集平台。
操作界面
01 搜索网址
这里以京东为例
我们来到京东界面,选择想要采集的类目,这里以口红为例。
02 输入网址
复制网址,在八爪鱼采集器点击自定义采集,粘贴商品网址并保存,网页信息将会出现。
03 设置循环翻页
取消自动识别,将页面滑到最下方,点击下一页,在跳出窗口中点击循环点击下一页
04 单击详情页链接
回到页面上方任选一个商品,点击商品标题,在跳出窗口点击全部,接着点击循环点击每个元素,进入商品详情页。
05 采集文本
选中商品名称,点击采集该元素的文本,接着循环以上步骤,分别采集该商品价格、评论数量等相关参数。
06 修改文本字段
点击文本字段,修改文本名称。
07 修改参数
点击采集流程,在点击翻页中选择设置。选择在页面加载后向下滚动一屏,滚动次数为六次,每次间隔两秒,然后点击应用。
在循环翻页中选择设置,设置参数为//a[@class="pn- next"]/EM[text()="下一页",然后点击应用。
08 启动采集并保存
点击采集,然后点击启动本地采集,耐心等待即可。
采集完想要的数据后,点击导出数据,选择导出方式为Excel,然后点击确定,最后将数据保存到想要的位置。
Function is introduced
Features
Octopus collector is a universal Internet data collector, which simulates the behavior of people browsing web pages. Through simple page selection, automatic collection process is generated, so as to convert web page data into structured data and store it in EXCEL or database and other forms. It also provides big data cloud collection solutions based on cloud computing to realize data collection. It is a data acquisition platform with one click.
interface
Operation Interface
Take JD.com as an example
We went to the interface of JINGdong and selected the categories we wanted to collect. Here, lipstick was taken as an example.
Enter url
Copy the url, click custom collection in octopus collector, paste the product URL and save, webpage information will appear.
Set the page turning cycle
To disable auto recognition, slide the page to the bottom and click Next. In the pop-up window, click loop and click Next
Click the details page link
Go back to the top of the page to select any product, click the title of the product, click all in the pop-up window, and then click the loop to click each element to enter the product details page.
The text collected
Select the name of the product, click the text of the element to collect, and then repeat the above steps to collect the price of the product, the number of comments and other related parameters.
Modify text field
Click the text field to change the text name.
Modify the parameters
Click the collection process and select Settings in click page turning. Select scroll down a screen six times after the page loads, two seconds apart, and then click Apply.
Select Settings in the loop page turn, set the parameter to //a[@class=" pn-next "]/EM[text()=" Next page ", and then click Apply.
Start collection and save
Click Collect, then click Start local collection, and wait patiently.
After collecting the desired data, click Export data, select Excel as the export mode, then click OK, and finally save the data to the desired location.
今天的分享就到这里啦!如果您对今天的文章有独特的想法,欢迎给我们留言,让我们相约明天,祝您今天过得开心快乐!
参考资料:百度百科、商业数据科学导论课堂视频资料、谷歌翻译
本文由LearningYard新学苑原创,部分图片及文字来源于网络,若有侵权请联系删除。
怎么分析八爪鱼采集的数据,怎么分析八爪鱼采集的数据是否正确,八爪鱼数据采集器
作者:整理来源:360问答,时间:2022-07-31 20:00,浏览:77