• 首页 首页 icon
  • 工具库 工具库 icon
    • IP查询 IP查询 icon
  • 内容库 内容库 icon
    • 快讯库 快讯库 icon
    • 精品库 精品库 icon
    • 问答库 问答库 icon
  • 更多 更多 icon
    • 服务条款 服务条款 icon

duckduckgo API不返回结果

用户头像
it1352
帮助1

问题说明

修改我现在认识的API仅仅是不够的,甚至没有工作。
我想我的重定向问题,我希望能够自动搜索神奇地使用duckduckgo自己我感觉鸭子。所以,我可以搜索计算器来说吧,让主界面( http://stackoverflow.com/ )作为我的结果

Edit I now realize the API is simply inadequate and is not even working. I would like to redirect my question, I want to be able to auto-magically search duckduckgo using their "I'm feeling ducky". So that I can search for "stackoverflow" for instance and get the main page ("http://stackoverflow.com/") as my result.

我使用的是duckduckgo API。 这里

I am using the duckduckgo API. Here

和我发现,使用时:

r = duckduckgo.query("example")

结果并不能反映一个手动搜索,即:

The results do not reflect a manual search, namely:

for result in r.results:
    print result

结果:

>>> 
>>> 

无。

而在结果寻找一个指数导致一个出界失误,因为它是空的。

And looking for an index in results results in an out of bounds error, since it is empty.

我怎样才能得到结果的搜索?

How am I supposed to get results for my search?

这似乎(根据其记录的例子)的API应该回答问题,并给出了一种在 r.answer.text

It seems the API (according to its documented examples) is supposed to answer questions and give a sort of "I'm feeling ducky" in the form of r.answer.text

不过,该网站是在这样的,我不能搜索,并用普通方法解析结果的方式进行。

But the website is made in such a way that I can not search it and parse results using normal methods.

我想知道我应该如何解析的搜索结果与此API或从本网站的任何其他方法。

I would like to know how I am supposed to parse search results with this API or any other method from this site.

感谢您。

正确答案

#1

如果您访问 DuckDuck转到API页面,你会发现有关使用API的一些注意事项。第一个音符明确指出:

If you visit DuckDuck Go API Page, you will find some notes about using the API. The first notes says clearly that:

由于这是一个零点击信息API,最深处查询(非主题名称)
  为空白。

As this is a Zero-click Info API, most deep queries (non topic names) will be blank.

这是这里的这些字段的列表:

An here's the list of those fields:

Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""

因此,它可能是一个遗憾,但他们的API只是截断了一堆结果,并没有给他们你;可能工作得更快,而且好像也没什么可除了使用

所以,很显然,这种情况下,API中是不是要走的路。

So, obviously, in that case API is not the way to go.

对于我来说,我只看到一个出路左:从 duckduckgo.com 检索原始的HTML,并使用解析它,例如 html5lib (值得一提的是他们的HTML是结构良好)。

As for me, I see only one way out left: retrieving raw html from duckduckgo.com and parsing it using, e.g. html5lib (it worth to mention that their html is well-structured).

这也是值得一提的是解析HTML网页是不是报废数据,因为HTML结构可以改变的最可靠方法,直到更改都公开宣布,而API通常保持稳定。

It also worth to mention that parsing html pages is not the most reliable way to scrap data, because html structure can change, while API usually stays stable until changes are publicly announced.

下面是和榜样如何能与 BeautifulSoup 取得这样解析:

Here's and example of how can be such parsing achieved with BeautifulSoup:

from BeautifulSoup import BeautifulSoup
import urllib
import re

site = urllib.urlopen('http://duckduckgo.com/?q=example')
data = site.read()

parsed = BeautifulSoup(data)
topics = parsed.findAll('div', {'id': 'zero_click_topics'})[0]
results = topics.findAll('div', {'class': re.compile('results_*')})

print results[0].text

本脚本打印:

u'Eixample, an inner suburb of Barcelona with distinctive architecture'

直接查询的主页上的问题是,它使用JavaScript来产生所需的结果(不相关的主题),所以你可以使用HTML版本只得到结果。 HTML版本有不同的链接:

The problem of direct querying on the main page is that it uses JavaScript to produce required results (not related topics), so you can use HTML version to get results only. HTML version has different link:


  • http://duckduckgo.com/?q=example # JavaScript version
  • http://duckduckgo.com/html/?q=example # HTML-only version

让我们看看我们可以得到:

Let's see what we can get:

site = urllib.urlopen('http://duckduckgo.com/html/?q=example')
data = site.read()
parsed = BeautifulSoup(data)

first_link = parsed.findAll('div', {'class': re.compile('links_main*')})[0].a['href']

保存在 first_link 结果变量是第一个链接的结果的(不是的相关搜索的)的搜索引擎输出:

The result stored in first_link variable is a link to the first result (not a related search) that search engine outputs:

http://www.iana.org/domains/example

要得到所有你可以遍历找到的链接标签(除了链接其他数据可以收到类似的方法)

To get all the links you can iterate over found tags (other data except links can be received similar way)

for i in parsed.findAll('div', {'class': re.compile('links_main*')}):
    print i.a['href']

http://www.iana.org/domains/example
https://twitter.com/example
https://www.facebook.com/leadingbyexample
http://www.trythisforexample.com/
http://www.myspace.com/leadingbyexample?_escaped_fragment_=
https://www.youtube.com/watch?v=CLXt3yh2g0s
https://en.wikipedia.org/wiki/Example_(musician)
http://www.merriam-webster.com/dictionary/example
...

请注意,只有HTML版本只包含的结果的,而对于的相关搜索的必须使用JavaScript版本。 (vithout HTML 中的URL部分)。

Note that HTML-only version contains only results, and for related search you must use JavaScript version. (vithout html part in url).

这篇好文章是转载于:学新通技术网

  • 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
  • 本站站名: 学新通技术网
  • 本文地址: /reply/detail/tanhcgeibf
系列文章
更多 icon
同类精品
更多 icon
继续加载