银河里的星星

落在人间

日志

关于我

星星

文章分类

Baidu搜索参数和页面结构(zz)

2009-03-12 14:37:35| 分类：技术专题 | 标签： |举报 |字号大中小订阅

下载LOFTER 我的照片书 |

from:http://www.snowhack.com/blog/catalog.asp?cate=10

1、百度网页搜索的查询参数

必备参数

☆ wd--查询的关键词(Keyword)
☆ pn--显示结果的页数(Page Number)
☆ cl--搜索类型(Class)，cl=3为网页搜索

可选参数
☆ rn--搜索结果显示条数(Record Number),取值范围在10--100条之间，缺省设置rn=10
☆ ie--查询输入文字的编码(Input Encoding),缺省设置ie=gb2312,即为简体中文
☆ tn--提交搜索请求的来源站点
几个有用的tn
tn=baidulocal 表示百度站内搜索，返回的结果很干净，无广告干扰。比如，在百度站内搜索“快乐”，看看返回结果是不是很清爽。
tn=baiducnnic 想把百度放在框架中吗？试试这个参数就可以了，是百度为Cnnic定制的

☆ si--在限定的域名中搜索,比如想在新浪的站内搜索可使用参数si=sina.com.cn,要使这个参数有效必须结合ct参数一起使用。

☆ ct--此参数的值一般是一串数字，估计应该是搜索请求的验证码

si和ct参数结合使用，比如在sina.com.cn中搜索"理想",可用：http://www.baidu.com/s?q=&ct=2097152&si=sina.com.cn&ie=gb2312&cl=3&wd=理想

☆ bs--上一次搜索的关键词(Before Search)，估计与相关搜索有关

2、百度搜索结果页面结构

按源代码结构自上而下为：

搜索框
右侧的火爆地带固定排名
搜索结果
分页区
相关搜索
底部搜索框
版权区

其中“搜索结果、分页区”这两部分就是我们需要的有效数据，根据其代码结果可以发现其唯一的字符串标识，通过这个标识截取内容就可以了。

百度搜索基本上没有什么反采集的措施，主要一点就是百度隔一段时间会更改返回结果页面的源代码，所以要经常观察百度的搜索结果页面，发现代码变动了，就将几处字符串标识改动一下。在反采集方面，百度比Google大度多了，目前还没发现由于频繁查询百度而出现暂时屏蔽来源站点IP的现象

评论这张

转发至微博

阅读(910)| 评论(0)

历史上的今天

this.p={  m:2,
              b:2,
              loftPermalink:'',
              id:'fks_085074082087089071093081082095083086088068085081080068',
              blogTitle:'Baidu搜索参数和页面结构(zz)',
              blogAbstract:'<p\>from:<a target=\"_blank\" rel=\"nofollow\" href=\"http://www.snowhack.com/blog/catalog.asp?cate=10\"  \>http://www.snowhack.com/blog/catalog.asp?cate=10</a\></p\>  <p\>1、百度网页搜索的查询参数</p\>  <p\>必备参数</p\>  <p\>☆ wd--查询的关键词(Keyword)<br\>☆ pn--显示结果的页数(Page Number)<br\>☆ cl--搜索类型(Class)，cl=3为网页搜索</p\>  <p\>可选参数<br\>☆ rn--搜索结果显示条数(Record Number),取值范围在10--100条之间，缺省设置rn=10<br\>☆ ie--查询输入文字的编码(Input Encoding),缺省设置ie=gb2312,即为简体中文</p\>',
              blogTag:'',
              blogUrl:'blog/static/70971767200921223735897',
              isPublished:1,
              istop:false,
              type:2,
              modifyTime:1345944679784,
              publishTime:1236839855897,
              permalink:'blog/static/70971767200921223735897',
              commentCount:0,
              mainCommentCount:0,
              recommendCount:0,
              bsrk:-100,
              publisherId:0,
              recomBlogHome:false,
              currentRecomBlog:false,
              attachmentsFileIds:[],
              vote:{},
              groupInfo:{},
              friendstatus:'none',
              followstatus:'unFollow',
              pubSucc:'',
              visitorProvince:'',
              visitorCity:'',
              visitorNewUser:false,
              postAddInfo:{},
              mset:'000',
              mcon:'',
              srk:-100,
              remindgoodnightblog:false,
              isBlackVisitor:false,
              isShowYodaoAd:false,
              hostIntro:'',
              hmcon:'1',
              selfRecomBlogCount:'0',
              lofter_single:'<iframe width="140" height="560" style="overflow:hidden;" src="http://www.lofter.com/mailEntry.do?blogad=1&blog" frameBorder="0"></iframe>'
            }

{list a as x}
    {if !!x}
    <div class="iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
      {if x.visitorName==visitor.userName}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        {if x.moveFrom=='wap'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/wapblog.html?frompersonalbloghome"><span title="来自网易手机博客" class="iblock wapIcon"> </span></a>
        {elseif x.moveFrom=='iphone'}
          <a class="noul pnt" target="_blank"><span title="来自iPhone客户端" class="iblock iphoneIcon"> </span></a>
        {elseif x.moveFrom=='android'}
          <a class="noul pnt" target="_blank"><span title="来自Android客户端" class="iblock androidIcon"> </span></a>
        {elseif x.moveFrom=='mobile'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/emsblog.html?frompersonalbloghome"><span title="来自网易短信写博" class="iblock wapIcon"> </span></a>
        {/if}
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
          ${fn(x.visitorNickname,8)|escape}
        </a>
      </div>
    </div>
    {/if}
    {/list}

<#--最新日志，群博日志--> <#--推荐日志-->

<p class="fc06">推荐过这篇日志的人：</p>
    <div>
      {list a as x}
      {if !!x}
      <div class="iblock nbw-fce nbw-f40">
        <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
        <img alt="${x.recommenderNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.recommenderName)}"/>
        </a>
        <div class="cwd thide">
          <a class="fc03 m2a" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
            ${fn(x.recommenderNickname,6)|escape}
          </a>
        </div>
      </div>
      {/if}
      {/list}
    </div>
    {if !!b&&b.length>0}
    <p  class="fc06">他们还推荐了：</p>
    <ul>
    {list b as y}
      {if !!y}
        <li class="rrb"><span class="iblock">·</span><a class="fc03 m2a" target="_blank" href="http://blog.163.com/${y.recommendBlogPermalink}/?from=blog/static/70971767200921223735897">${y.recommendBlogTitle|escape}</a></li>
      {/if}
    {/list}
    </ul>
    {/if}

<#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇，下一篇--> <#-- 热度 -->

{list a as x}
    {if !!x}
    <div class="hotItem iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
      {if x.publisherUsername==visitor.userName}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
          ${fn(x.publisherNickname,8)|escape}
        </a>
      </div>
      <a class="f-myLikeIcons hottype {if x.type==1} js-liketype{elseif x.type==2} js-reblogtype{elseif x.type==3} js-sharetype{else}{/if}" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/"> </a>
    </div>
    {/if}
    {/list}

<#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->

页脚

我的照片书 - 手机博客 - 下载LOFTER APP - 订阅此博客

银河里的星星

导航

日志

Baidu搜索参数和页面结构(zz)

历史上的今天

最近读者

热度

评论

页脚