在R使用XML包时出错
问题说明
我正在收集有关不同大学的数据,在执行以下代码后,我对追随错误有疑问.问题是使用htmlParse()
I am gathering data about different universities and I have a question about the follow error after executing the following code. The problem is when using htmlParse()
代码:
url1<-"http://nces.ed.gov/collegenavigator/?id=165015"
url1 <- "http://nces.ed.gov/collegenavigator/?id=165015"
webpage1<-getURL(url1)
webpage1<- getURL(url1)
doc1<-htmlParse(webpage1)
doc1 <- htmlParse(webpage1)
输出:
htmlParse(webpage1)中的错误:文件
Error in htmlParse(webpage1) : File
!DOCTYPE html PUBLIC-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
html xmlns ="http://www.w3.org/1999/xhtml"头id ="ctl00_hd"元http-equiv =内容类型" content ="text/html;字符集= UTF-8" /标题
html xmlns="http://www.w3.org/1999/xhtml" head id="ctl00_hd"meta http-equiv="Content-type" content="text/html;charset=UTF-8" /title
College Navigator - National Center for Education Statistics
/titlelink href ="https://www.it1352.com/css/md0.css" type ="text/css" rel ="stylesheet" meta name ="keywords" content =学院导航员,学院搜索,中学后教育,中学后统计,NCES,IPEDS ,college locator"/meta meta name =" description"content =" College Navigator是一个免费的消费者信息工具,旨在帮助学生,父母,高中辅导员和其他人获取有关美国7,000多家高等教育机构的信息-提供的计划,保留率和毕业率,价格,可用的援助,授予的学位,校园安全和认证."meta>元名称="机器人"content ="索引,nofollow"/metalink
/titlelink href="https://www.it1352.com/css/md0.css" type="text/css" rel="stylesheet" meta name="keywords" content="college navigator,college search,postsecondary education,postsecondary statistics,NCES,IPEDS,college locator"/meta meta name="description" content="College Navigator is a free consumer information tool designed to help students, parents, high school counselors, and others get information about over 7,000 postsecondary institutions in the United States - such as programs offered, retention and graduation rates, prices, aid available, degrees awarded, campus safety, and accreditation."meta>meta name="robots" content="index,nofollow"/metalink
在使用此程序包之前,我的网站已抓取网页,但我从未遇到过问题.名称="robots"与它有关系吗?任何帮助将不胜感激.
I have webs scraped pages before using this package and I never had an issue. Does the name="robots" have anything to do with it? Any help would be greatly appreciate.
正确答案
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /reply/detail/tanhcffgae
-
YouTube API 不能在 iOS (iPhone/iPad) 工作,但在桌面浏览器工作正常?
it1352 07-30 -
iPhone,一张图像叠加到另一张图像上以创建要保存的新图像?(水印)
it1352 07-17 -
保持在后台运行的 iPhone 应用程序完全可操作
it1352 07-25 -
使用 iPhone 进行移动设备管理
it1352 07-23 -
在android同时打开手电筒和前置摄像头
it1352 09-28 -
检查邮件是否发送成功
it1352 07-25 -
扫描 NFC 标签时是否可以启动应用程序?
it1352 08-02 -
Android微调工具-删除当前选择
it1352 06-20 -
Android App 和三星 Galaxy S4 不兼容
it1352 07-20 -
希伯来语的空格句子标记化错误
it1352 06-22