Itanium C++ ABI下member pointer的实现

Posted on 2019-02-19 Edited on 2023-10-29 In C++

Itanium C++ ABI

Itanium C++ ABI是一个用于C++的ABI。作为ABI，它给出了实现该语言的精确规则，确保程序中单独编译的部分能够成功地互操作。尽管它最初是为Itanium架构开发的，但它不是特定于平台的，可以在任意的C ABI之上进行分层移植。因此，它被用作所有主要架构上的许多主要操作系统的标准C++ ABI，并在许多主要的c++编译器中实现，包括GCC和Clang。

简单点来说，x64的Linux上，GCC和Clang都是遵循Itanium C++ ABI的。所以今天就针对这个它来探讨一下member pointer的实现。

pointer to data member

A pointer to data member is an offset from the base address of the class object containing it, represented as a ptrdiff_t. It has the size and alignment attributes of a ptrdiff_t. A NULL pointer is represented as -1.

指向数据成员的指针，实现为在整个类中的偏移量。可以看成是ptrdiff_t类型的数据。

接下来看个例子：

struct Test
{
    int a;
    char b;
    double c;
};

int main()
{
    int Test::*ptr2a = &Test::a;
    char Test::*ptr2b = &Test::b;
    double Test::*ptr2c = &Test::c;

    std::cout << *(std::ptrdiff_t*)(&ptr2a) << std::endl;
    std::cout << *(std::ptrdiff_t*)(&ptr2b) << std::endl;
    std::cout << *(std::ptrdiff_t*)(&ptr2c) << std::endl;
}

输出结果为0,4,8。考虑到对齐，确实为各个成员的偏移量。

pointer to function

A pointer to member function is a pair as follows:

ptr:

For a non-virtual function, this field is a simple function pointer. (Under current base Itanium psABI conventions, that is a pointer to a GP/function address pair.) For a virtual function, it is 1 plus the virtual table offset (in bytes) of the function, represented as a ptrdiff_t. The value zero represents a NULL pointer, independent of the adjustment field value below.

adj:

The required adjustment to this, represented as a ptrdiff_t.

指向成员函数的指针。分为ptr部分和adj部分。ptr可分为指向非虚函数和虚函数的情况。adj表示对于this的调整，可以看成ptrdiff_t类型。

ps：关于这个adj是干什么用的我也不是很清楚，猜测有可能和多继承有关系？ = =。以后知道了再补充吧，现在先主要讲解ptr。

pointer to non-virtual function

对于非虚函数来说，ptr部分就是简单的函数地址。可以通过这个得到成员函数地址，甚至直接调用它：

struct Test
{
    void func() {
        std::cout << this << "  Test::func() is called\n";
    }
};
int main()
{
    Test t;
    
    auto ptr2func = &Test::func;
    
    // 得到func的地址
    uint64_t addr = *(uint64_t*)&ptr2func;
    
    // 内联汇编，等效于下面一行
    asm volatile("leaq %0, %%rdi ; callq *%1" : : "m"(t),"r" (addr) : "rdi" );
    // (t.*ptr2func)();
}

这里将ptr2func定义为成员函数指针，然后提取出它的ptr部分，既函数地址，保存到addr中。然后将t的地址传入rdi寄存器，充当this指针。x64的calling convention中，rdi存储函数调用的第一个参数，所以将this指针作为隐式的第一个参数存进了rdi寄存器。最后通过addr的函数地址，call指令进行调用。最后打印出this，与直接(t.*ptr2func)()效果相同。

pointer to virtual function

对于虚函数来说，ptr部分为函数在虚表中的偏移量(单位为byte)加1。如果为0，表示为NLLL pointer，虚表中没有这个函数的指针。

所以，如果我们知道了虚表的位置(对象的第一个字，就是虚表指针)，结合ptr表示的偏移量，也能得到函数的地址，从而调用它：

struct Test
{
    virtual void f1() {
        std::cout << this << "  Test::f1() is called\n";
    }
    virtual void f2() {
        std::cout << this << "  Test::f2() is called\n";
    }
};

int main() {
    Test t;

    auto ptr2f1 = &Test::f1;

    // 得到虚表的地址
    uint8_t* vtable = *(uint8_t**)(&t);
    // 得到f1函数在虚表中的偏移量
    std::ptrdiff_t f1_offset = *(std::ptrdiff_t*)(&ptr2f1) - 1;
    // 得到f1函数的地址
    uint64_t f1_addr = *(uint64_t*)(vtable + f1_offset);
    // 调用它，相面两行等效
    asm volatile ("leaq %0, %%rdi; callq *%1" : : "m" (t), "r" (f1_addr) : "rdi");
    (t.*ptr2f1)();
}

可以看到，我们首先在对象的首字处得到了虚表的地址vtable，然后通过成员函数指针的ptr部分得到了f1函数在虚表中的偏移量f1_offset。然后解引用得到了f1函数的地址，最后调用它。rdi寄存器存储this指针，这点前面已经谈过。最终结果与(t.*ptr2f1)()等价。