MIT 6.S081 Lecture 2 Notes

The creation of a thousand forests is one acorn.

6.S081第二课,主要是TA教授一些C的知识与GDB使用。在这里仅记录一些有意思的点,

Intor to C

课程里这个介绍C知识的slides比较短小精悍,基本覆盖了大部分常见且关键的知识点。美中不足的是没有关于 spiral rule 的知识。这个处理规则在写一些复杂的 typedef 与函数指针声明时十分有用,比如 signal handler 的类型声明。

void* pointer

You can’t define a variable of type void, because… what would that mean?

But you can define a variable of type void *. You just can’t dereference it.

(You aren’t allowed to use arithmetic on void * pointers officially, but we use GCC, which supports it as an extension to C.)

关于void* 指针可以指但无法解引用的特性,这点常常在底层库中应用为各式指针类型的转换。关于void* 的指针算术标准是不提供的,具体可以看标准用数组的类比,由于void数组不可能,所以 void* 算术也不可能。但GCC开了一个口,进行扩展,将 sizeof void 当作一字节,以支持 void* 算术计算。

local static variable initialization

关于局部静态对象何时初始化的问题。C与C++明显有差异。slides的例子里,

int add_cumulative_numbers(int increase) {
  static int total_sum = 0;
  total_sum += increase;
  return total_sum;
}

total_sum will be initialized to zero at program start, and it will keep its value across calls to add_cumulative_numbers! It won’t be reinitialized.

但是在C++里,情况稍复杂一点。Effective C++系列(忘了是不是Modern)提到过,局部静态对象可以实现为仅在对应函数第一次被调用时初始化,做到类似 lazy initialization 的实现。例如learncpp中这段C++代码,

#include <iostream>

void incrementAndPrint()
{
    static int s_value{ 1 }; // static duration via static keyword.  This initializer is only executed once.
    ++s_value;
    std::cout << s_value << '\n';
} // s_value is not destroyed here, but becomes inaccessible because it goes out of scope

int main()
{
    incrementAndPrint();
    incrementAndPrint();
    incrementAndPrint();

    return 0;
}

In this program, because s_value has been declared as static, it is created at the program start.

Static local variables that are zero initialized or have a constexpr initializer can be initialized at program start. Static local variables with non-constexpr initializers are initialized the first time the variable definition is encountered (the definition is skipped on subsequent calls, so no reinitialization happens). Because s_value has constexpr initializer 1, s_value will be initialized at program start.

简单说就是虽然在程序开始时创建,但初始化的时间根据 initializer 是否为 constexpr 有区别。对于 non-constexpr 可以做到lazy initialization。这个由于比较确定,就不找标准的原文验证了。

typedef in header file

将一些类型定义在头文件中,有利编写跨平台代码。只更换头文件即可作出平台对不同类型的适应。如在32位平台,可以将xv6 的 typedef unsigned int uint32; 替换为 typedef unsigned long uint32;等。编写一些跨平台代码时,最好也用这些明确定义比特位数的类型。关于include guard就不展开说了,xv6的头文件暂时都没用到这个特性(可能头文件太少,内核太简单,没有include重复展开的问题)。

// from kernel/types.h:
typedef unsigned char uint8;
typedef unsigned short uint16;
typedef unsigned int uint32;
typedef unsigned long uint64;

Miscellaneous

一点有趣的杂项。通过字段名字而不是顺序初始化一个结构体,

struct xy_point my_point = {
  .y = -6.2,
  .x = 12.5,
};

判断一个整数是否为2的幂次,

// Check if integer is a power of two
if (my_int && !(my_int & (my_int - 1))) { /* ... */ }

对于一个大于0的整数其实 !(my_int & (my_int - 1)) 已经能够判断。因为如果二进制表示有1(大于0二进制表示必有1),my & (my_int - 1) 相当于把最低位置的1去掉, 如果为2幂次,则二进制表示只有一个1,结果为0;如果不为2幂次,则二进制表示多个1,结果为非0。考虑 my_int 为0的特殊情况,所以还要在前面加 my_int && ... 形式。

A pointer challenge

最后 slides 提供了一个来自The Ksplice Pointer Challenge的问题,检验对于C指针的理解,问题要求基于64位小端机器,int为32的假设,写出四个 printf 的内容。

int main() {
 int x[5]; // x is at 0x7fffdfbf7f00
 printf("%p\n", x);
 printf("%p\n", x+1);
 printf("%p\n", &x);
 printf("%p\n", &x+1);
 return 0;
}

答案与解释博客中均有,比较有意思的是最后 &x+1 的答案。因为 &x is a int *[5] 所以 offset 的单位是 5 * sizeof(int), 即20个字节。最终答案为 0x7fffdfbf7f14

GDB

课程slides教授了GDB的基本使用。最开始讲了执行代码时栈的变化,即函数调用时如何压栈保存寄存器(caller-saved registers),以及为函数局部变量分配的栈空间大小,这类问题属于本科课程与CSAPP都讲过,就不再赘述。我主要把这个slides的GDB使用做成 cheat sheet。

command effect
help <command-name> show usage about command
Stepping  
step runs one line of code at a time. When there is a function call, it steps into the called function
next does the same thing, except that it steps over function calls
stepi and nexti do the same thing for assembly instructions rather than lines of code
All take a numerical argument to specify repetition. Pressing the enter key repeats the previous command.  
Running  
continue runs code until a breakpoint is encountered or you interrupt it with Control-C
finish runs code until the current function returns
advance <location> runs code until the instruction pointer gets to the specified location
Breakpoints  
break <location> sets a breakpoint at the specified
location Locations can be memory addresses (“*0x7c00”) or names (“mon backtrace”, “monitor.c:71”). Modify breakpoints using delete, disable, enable.  
Conditional breakpoints  
break <location> if <condition> sets a breakpoint at the specified location, but only breaks if the condition is satisfied.
cond <number> <condition> adds a condition on an existing breakpoint.
Watchpoints  
watch <expression> will stop execution whenever the expression’s value changes.
watch -l <address> will stop execution whenever the contents of the specified memory address change.
rwatch [-l] <expression> will stop execution whenever the value of the expression is read.
Examining  
x prints the raw contents of memory in whatever format you specify (x/x for hexadecimal, x/i for assembly, etc).
print evaluates a C expression and prints the result as its proper type. It is often more useful than x.
The output from p *((struct elfhdr *) 0x10000) is much nicer than the output from x/13x 0x10000.  
More examining  
info registers prints the value of every register.
info frame prints the current stack frame.
list <location> prints the source code of the function at the specified location.
backtrace display the current call stack (can be used after a runtime error, eg. segfault).
Layouts  
layout <name> switches to the given layout. GDB has a text user interface that shows useful information like code listing, disassembly, and register contents in a curses UI.
Miscellaneous  
set var var=expr stores value of expression expr into program variable var during execution. e.g. set var g=4 store 4 into program variable g
symbol-file switches symbol files. e.g. when debugging JOS, symbol-file obj/user/<name> and symbol-file obj/kern/kernel will switch function and variable names between environments and the kernel.

P.S.

之前发布的公众号文章超链接无法跳转,以后公众号文章尝试添加博客原文链接与引用链接。

Reference

  1. MIT 6.S081 2021
  2. Intor to C
  3. Spiral rule
  4. C reference pointer arithmetic
  5. GCC Arithmetic extension
  6. Learn cpp static local variables
  7. The Ksplice Pointer Challenge
  8. GDB slides