银河里的星星

落在人间

日志

关于我

星星

文章分类

两个不同blas调用在cell上的性能差异之原因

2010-04-25 11:13:19| 分类：高性能计算 | 标签： |举报 |字号大中小订阅

下载LOFTER 我的照片书 |

使用ps3,调用blas函数时,考察函数花费时间,发现还不如不使用使用spe的blas优化实现,即cellsdk的blas实现花费的时间要比netlib的那个blas实现时间还多.

编译过程分别如下:
使用cellsdk的blas库实现:
mpicc data_plot.c -I/opt/cell/sdk/usr/include/ -Wall -L .   /home/duanple/hete_show/hete_lib/src/hetelib.a -lblas -lspe2 -lm -lnuma
env BLAS_NUMSPES=5 BLAS_HUGE_FILE=/home/duanple/huge/blas_lib.bin numactl --cpunodebind=0 --membind=0 ./a.out
使用netlib的实现:
make plot
mpicc data_plot.c -I/opt/cell/sdk/usr/include/ -Wall -L .   /home/duanple/hete_show/hete_lib/src/hetelib.a /home/duanple/CBLAS/lib/LINUX/cblas_LINUX.a /home/duanple/BLAS/blas_LINUX.a
./a.out

测试程序原始实现如下:

void cblas_daxpy_test(int input){
    int size = (HETE_INT)input;
    float * data = (float *)hete_malloc(size);
    int num = size/sizeof(float);
    float temp = 1.3;

    cblas_saxpy(num,temp,data,1,data,1);

    hete_free(data);

}
然后在循环里测试:
for(i = 0,input = 0 ; i < test_num ; i++,input += step_size){
        start = MPI_Wtime();
        cblas_daxpy_test(input);
        end = MPI_Wtime();
        printf("%d %lf\n",i,end-start);
    }
输出如下:
0 0.006680
1 0.016506
2 0.030331
3 0.045189
4 0.058386
5 0.074658
6 0.087803
7 0.087699
8 0.120216
9 0.124565
10 0.140279
11 0.155261
12 0.173850
13 0.185558
14 0.208170
15 0.234272
16 0.228106
17 0.241932
18 0.268046
19 0.267028
而如果使用netlib的blas,结果如下:
0 0.000007
1 0.002885
2 0.004458
3 0.006170
4 0.007782
5 0.009437
6 0.011206
7 0.012816
8 0.014590
9 0.016170
10 0.017870
11 0.019489
12 0.021149
13 0.022912
14 0.024474
15 0.026168
16 0.027774
17 0.029623
18 0.031152
19 0.032826
测试结果表明还不如不采用spe:

经思考,参考cellsdk的blas文档,可以发现spe实现都有一个启动时间
Startup costs
There is a one time startup cost due to initialization and setup of memory and
SPEs within the BLAS library. This one time start-up cost is incurred only when an
application invokes an optimized BLAS routine for the first time. Subsequent
invocations of optimized BLAS routines by the same application do not incur this
cost.
经过测试,终于找到原因.在我们上面的实现中,每次调用都是对函数使用的空间进行了malloc free.这样导致每次必须进行这个启动过程.
当修改为下面的实现时:

    float * data = (float *)hete_malloc(step_size*test_num);
    float temp = 1.3;

    for(i = 0,input = 0 ; i < test_num ; i++,input += step_size){

        int num = input/sizeof(float);
        start = MPI_Wtime();
        cblas_saxpy(num,temp,data,1,data,1);
        end = MPI_Wtime();
        printf("%d %lf\n",i,end-start);
    }

    hete_free(data);

测试结果如下:
0 0.006488
1 0.015739
2 0.014540
3 0.013468
4 0.014251
5 0.014466
6 0.014591
7 0.014846
8 0.014824
9 0.014981
10 0.015195
11 0.015296
12 0.015537
13 0.015686
14 0.015734
15 0.015938
16 0.016045
17 0.016374
18 0.016369
19 0.016592
可以看到运行时间明显缩短,而且比netlib的blas实现更快,而其其花费时间的增长率更慢,这样对于更大数据量的测试,将能表现出更好的性能.

评论这张

转发至微博

阅读(1133)| 评论(0)

历史上的今天

this.p={  m:2,
              b:2,
              loftPermalink:'',
              id:'fks_080070087081088066086094080095083086088068085081080068',
              blogTitle:'两个不同blas调用在cell上的性能差异之原因',
              blogAbstract:'使用ps3,调用blas函数时,考察函数花费时间,发现还不如不使用使用spe的blas优化实现,即cellsdk的blas实现花费的时间要比netlib的那个blas实现时间还多.<br\><br\>编译过程分别如下:<br\>使用cellsdk的blas库实现:<br\>mpicc da<wbr\>ta_plot.c  -I/opt/cell/sdk/usr/include/ -Wall -L .   /home/duanple/hete_show/hete_lib/src/hetelib.a -lblas -lspe2 -lm -lnuma<br\>env BLAS_NUMSPES=5 BLAS_HUGE_FILE=/home/duanple/huge/blas_lib.bin numactl --cpunodebind=0 --membind=0  ./a.out',
              blogTag:'',
              blogUrl:'blog/static/709717672010325111319419',
              isPublished:1,
              istop:false,
              type:2,
              modifyTime:1272165629902,
              publishTime:1272165199419,
              permalink:'blog/static/709717672010325111319419',
              commentCount:0,
              mainCommentCount:0,
              recommendCount:0,
              bsrk:-100,
              publisherId:0,
              recomBlogHome:false,
              currentRecomBlog:false,
              attachmentsFileIds:[],
              vote:{},
              groupInfo:{},
              friendstatus:'none',
              followstatus:'unFollow',
              pubSucc:'',
              visitorProvince:'',
              visitorCity:'',
              visitorNewUser:false,
              postAddInfo:{},
              mset:'000',
              mcon:'',
              srk:-100,
              remindgoodnightblog:false,
              isBlackVisitor:false,
              isShowYodaoAd:false,
              hostIntro:'',
              hmcon:'1',
              selfRecomBlogCount:'0',
              lofter_single:'<iframe width="140" height="560" style="overflow:hidden;" src="http://www.lofter.com/mailEntry.do?blogad=1&blog" frameBorder="0"></iframe>'
            }

{list a as x}
    {if !!x}
    <div class="iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
      {if x.visitorName==visitor.userName}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        {if x.moveFrom=='wap'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/wapblog.html?frompersonalbloghome"><span title="来自网易手机博客" class="iblock wapIcon"> </span></a>
        {elseif x.moveFrom=='iphone'}
          <a class="noul pnt" target="_blank"><span title="来自iPhone客户端" class="iblock iphoneIcon"> </span></a>
        {elseif x.moveFrom=='android'}
          <a class="noul pnt" target="_blank"><span title="来自Android客户端" class="iblock androidIcon"> </span></a>
        {elseif x.moveFrom=='mobile'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/emsblog.html?frompersonalbloghome"><span title="来自网易短信写博" class="iblock wapIcon"> </span></a>
        {/if}
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
          ${fn(x.visitorNickname,8)|escape}
        </a>
      </div>
    </div>
    {/if}
    {/list}

<#--最新日志，群博日志--> <#--推荐日志-->

<p class="fc06">推荐过这篇日志的人：</p>
    <div>
      {list a as x}
      {if !!x}
      <div class="iblock nbw-fce nbw-f40">
        <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
        <img alt="${x.recommenderNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.recommenderName)}"/>
        </a>
        <div class="cwd thide">
          <a class="fc03 m2a" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
            ${fn(x.recommenderNickname,6)|escape}
          </a>
        </div>
      </div>
      {/if}
      {/list}
    </div>
    {if !!b&&b.length>0}
    <p  class="fc06">他们还推荐了：</p>
    <ul>
    {list b as y}
      {if !!y}
        <li class="rrb"><span class="iblock">·</span><a class="fc03 m2a" target="_blank" href="http://blog.163.com/${y.recommendBlogPermalink}/?from=blog/static/709717672010325111319419">${y.recommendBlogTitle|escape}</a></li>
      {/if}
    {/list}
    </ul>
    {/if}

<#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇，下一篇--> <#-- 热度 -->

{list a as x}
    {if !!x}
    <div class="hotItem iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
      {if x.publisherUsername==visitor.userName}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
          ${fn(x.publisherNickname,8)|escape}
        </a>
      </div>
      <a class="f-myLikeIcons hottype {if x.type==1} js-liketype{elseif x.type==2} js-reblogtype{elseif x.type==3} js-sharetype{else}{/if}" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/"> </a>
    </div>
    {/if}
    {/list}

<#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->

页脚

我的照片书 - 手机博客 - 下载LOFTER APP - 订阅此博客

银河里的星星

导航

日志

两个不同blas调用在cell上的性能差异之原因

历史上的今天

最近读者

热度

评论

页脚