Home 字符串字面值作为实参时,被放在哪了?
Post
Cancel

字符串字面值作为实参时,被放在哪了?

coding时突然想到的一个问题?

在写NJU os的libco时,看到给的使用示例里有这么一段代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>
#include "co.h"

void entry(void *arg) {
  while (1) {
    printf("%s", (const char *)arg);
    co_yield();
  }
}

int main() {
  struct co *co1 = co_start("co1", entry, "a");
  struct co *co2 = co_start("co2", entry, "b");
  co_wait(co1); // never returns
  co_wait(co2);
}

对应的函数签名为:

struct co *co_start(const char *name, void (*func)(void *), void *arg);

众所周知,函数的局部变量在函数返回后将会被销毁,因此对于在函数返回后还要使用的变量就需要使用static将其声明为静态变量或使用malloc从堆上分配空间,否则将会产生use-after-free。那么疑问随之而来: 如上例所示使用字符串字面值作为函数参数,这个字符串将会被存放在哪?

为了简化问题,重新写了一个小的demo:

struct testread {
    char *arg;
    int* value;
};

struct testread* test(void * arg, int *i);
struct testread* ttest();

int main() {
    struct testread*ret = ttest();
    printf("%s %d\n", ret->arg, *(ret->value));
}  

struct testread* ttest() {
    int i = 10;
    struct testread *ret = test("hello", &i);
    return ret;
}

struct testread* test(void * arg, int *i) {
    printf("0x%p\n",arg);
    struct testread *ret = malloc(sizeof(struct testread));
    ret->arg = (char *)arg;
    ret->value = i;
    return ret;
}

很显然,ttest函数中声明的int i是局部变量,在ttest函数返回后必然被销毁,因此若”hello”一样是局部变量,那么和i一样会产生use-after-free错误。可惜,make之后,正常通过编译,并在运行之后输出了hello和i的值。

问题接踵而来,为什么看起来似乎两个变量都没有被销毁?

我在StackOverflow上找到一个非常相似的问题: c - free() not deallocating memory? - Stack Overflow, dereferencing a pointer which refers to deallocated memory is Undefined Behavior,因此编译器的任何行为都是合理的,而因为我们的demo比较小,在函数返回后并没有新的值占用这块空间,因此指针所指向的地址中保存的数据把那个没有被擦除改写,在main函数中print依然可以得到已经被销毁的变量值。同样,里面给出了观察到use-after-free的方法:使用gcc的-fsanitize=address选项,重新编译,看到如下报错:

gcc -fsanitize=address -c blah.c -o blah.o -g
gcc blah.o -o blah
blah.o: In function `main':
/home/clf/blah/blah.c:14: undefined reference to `__asan_report_load8'
/home/clf/blah/blah.c:14: undefined reference to `__asan_report_load4'
/home/clf/blah/blah.c:14: undefined reference to `__asan_report_load8'
blah.o: In function `ttest':
/home/clf/blah/blah.c:16: undefined reference to `__asan_option_detect_stack_use_after_return'
/home/clf/blah/blah.c:16: undefined reference to `__asan_stack_malloc_0'
/home/clf/blah/blah.c:17: undefined reference to `__asan_report_store4'
blah.o: In function `test':
/home/clf/blah/blah.c:24: undefined reference to `__asan_report_store8'
/home/clf/blah/blah.c:25: undefined reference to `__asan_report_store8'
blah.o: In function `_GLOBAL__sub_D_00099_0_main':
/home/clf/blah/blah.c:27: undefined reference to `__asan_unregister_globals'
blah.o: In function `_GLOBAL__sub_I_00099_1_main':
/home/clf/blah/blah.c:27: undefined reference to `__asan_init'
/home/clf/blah/blah.c:27: undefined reference to `__asan_version_mismatch_check_v8'
/home/clf/blah/blah.c:27: undefined reference to `__asan_register_globals'
collect2: error: ld returned 1 exit status
makefile:10: recipe for target 'blah' failed
make: *** [blah] Error 1

看起来两者都是作为了局部变量,在函数返回后均被销毁。那么回到一开始的问题,字符串字面值存放到了哪里?我们使用objdump查看编译后的.o文件,ttest函数对应的汇编指令如下所示:

0000000000000044 <ttest>:
  44:	55                   	push   %rbp
  45:	48 89 e5             	mov    %rsp,%rbp
  48:	48 83 ec 30          	sub    $0x30,%rsp
  4c:	c7 45 f4 0a 00 00 00 	movl   $0xa,-0xc(%rbp)
  53:	48 8d 45 f4          	lea    -0xc(%rbp),%rax
  57:	48 89 c2             	mov    %rax,%rdx
  5a:	48 8d 0d 07 00 00 00 	lea    0x7(%rip),%rcx        # 68 <ttest+0x24>
  61:	e8 1a 00 00 00       	callq  80 <test>
  66:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
  6a:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  6e:	48 89 c1             	mov    %rax,%rcx
  71:	e8 59 00 00 00       	callq  cf <print>
  76:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  7a:	48 83 c4 30          	add    $0x30,%rsp
  7e:	5d                   	pop    %rbp
  7f:	c3                   	retq   

在4c所示的位置,使用movl $0xa, -0xc(%rbp)将变量i初始化为10,但是并没有找到与”hello”对应的指令。全局搜索,在.rdata段中找到如下代码:

0000000000000000 <.rdata>:
   0:	25 73 20 25 64       	and    $0x64252073,%eax
   5:	0a 00                	or     (%rax),%al
   7:	68 65 6c 6c 6f       	pushq  $0x6f6c6c65
   c:	00 30                	add    %dh,(%rax)
   e:	78 25                	js     35 <.rdata+0x35>
  10:	70 0a                	jo     1c <.rdata+0x1c>

可以看到pushq 指令将”hello”压入了<.rdata>.

回到一开始的代码,对于从参数中传递的指针,如果无法确定调用者指针所指向的对象是局部变量还是全局变量,那么我认为最安全的办法是使用malloc重新从堆上分配一块合适大小的内存,并在该指针的声明周期结束时手动释放其所指向的内存空间。

This post is licensed under CC BY 4.0 by the author.

C++ tricks

x86-64 Assembly notes