首页 > c++ > 英特尔C ++优化器删除了masm代码

英特尔C ++优化器删除了masm代码 (Intel C++ optimizer removes masm code)

问题

我最近开始在我的一些项目中使用英特尔C ++编译器,同时还学习masm汇编。我一直听说不值得学习汇编,因为编译器在优化代码方面做得很好,所以考虑一下看哪一个更快。为了尝试这样做,我有以下c ++代码:

#include <iostream>
#include <time.h>

using namespace std;

extern "C" {
int Add(int a, int b);
}


int main(int argc, char * argv[]){
        int startingTime = clock();
        for (int i = 0; i < 100; i++)
        {
            cout << "normal: " << i << endl;
            cout << 1000 + 1000 << endl;
        }
        int timeTaken1 = clock() - startingTime;

        startingTime = clock();
        for (int i = 0; i < 100; i++){
             cout << "assem" << i << endl;
             cout << Add(2000, 2000) << endl;
        }
        int timeTaken2 = clock() - startingTime;

        cout << "Time taken under normal addition: " << timeTaken1 << endl;
        cout << "Time taken under assembly addition: " << timeTaken2 << endl;

        cin.get();
        return 0;
   }

以下masm代码:

.model flat
.386

.code

    public _Add

_Add PROC
        push ebp            ;
        mov ebp, esp        ;
        mov eax, [ebp + 8]  ;
        mov ebx, [ebp + 12] ;
        add eax, ebx        ;
        leave               ; cleanup
        ret                 ;


_Add endp
end

我正在使用Visual Studio编译它,使用英特尔Composer插件。当我在调试模式下运行它时,它完美地工作 - 我可以看到“正常99”和“组装99”以及相关的数字。当我使用为编译器指定的/ 0d运行它时,它也可以正常工作。但是,当指定/ 02,/ 0x或/ 03时,它仅显示正常(i + j)加法循环和汇编器加法的第一个值,即仅显示汇编0和4000。

我的猜测是汇编代码正在由英特尔编译器优化(这与VC ++编译器配合使用),我很想知道为什么会发生这种情况以及如何解决它,同时仍然让英特尔优化C ++部分。

谢谢SbSpider

编辑:我知道这是一个迟到,但感谢所有的回复。似乎汇编代码中的错误而不是intel编译器没有使用汇编代码。

解决方法

您的汇编代码正在EBX废弃寄存器(如Jongware所述),这可能是您的C ++代码中的第二个循环仅执行一次的原因。如果i存储在EBX然后更改EBX为2000 in Add将导致循环条件的下一个测试i < 100失败。

您需要用保存和恢复EBX你的汇编代码寄存器或你需要挑选那些假定在函数调用保存另一个寄存器(EAXEDX,或ECX)。

问题

I recently started using the Intel C++ compiler for some of my projects, while also learning masm assembly. I kept on hearing how it wasn't worth learning assembly since the compilers do a good job anyway of optimizing code, and so thought about having a look at which one was faster once and for all. To try and do so, I had the following c++ code:

#include <iostream>
#include <time.h>

using namespace std;

extern "C" {
int Add(int a, int b);
}


int main(int argc, char * argv[]){
        int startingTime = clock();
        for (int i = 0; i < 100; i++)
        {
            cout << "normal: " << i << endl;
            cout << 1000 + 1000 << endl;
        }
        int timeTaken1 = clock() - startingTime;

        startingTime = clock();
        for (int i = 0; i < 100; i++){
             cout << "assem" << i << endl;
             cout << Add(2000, 2000) << endl;
        }
        int timeTaken2 = clock() - startingTime;

        cout << "Time taken under normal addition: " << timeTaken1 << endl;
        cout << "Time taken under assembly addition: " << timeTaken2 << endl;

        cin.get();
        return 0;
   }

And the following masm code:

.model flat
.386

.code

    public _Add

_Add PROC
        push ebp            ;
        mov ebp, esp        ;
        mov eax, [ebp + 8]  ;
        mov ebx, [ebp + 12] ;
        add eax, ebx        ;
        leave               ; cleanup
        ret                 ;


_Add endp
end

I am using Visual Studio to compile this, using the Intel Composer plugin. When I run this under Debug mode, it works perfectly - I can see "normal 99" and "assem 99" along with the relevant number. When I run this with /0d specified for the compiler, then it also works fine. However, when /02, /0x or /03 are specified, it only shows the normal (i+j) addition loop and the first value of the assembler addition i.e. only assem 0 and 4000 are shown.

My guess is that the assembly code is being optimized out by the Intel Compiler (this works fine with the VC++ compiler), and am curious to find out why this is occurring and how it can be worked around, while still letting Intel optimize the C++ part.

Thanks SbSpider

EDIT: I know this is a late, but thanks for all of the replies. It seems that it was an error in the assembly code rather than the intel compiler not using the assembly code.

解决方法

Your assembly code is trashing the EBX register (as Jongware noted) and this likely why the second loop in your C++ code is only executed once. If i being stored in EBX then changing EBX to 2000 in Add will cause the next test of the loop condition i < 100 to fail.

You need either save and restore the EBX register in your assembly code or you need to pick another register that isn't assumed to be preserved across function calls (EAX, EDX, or ECX).

相似信息