【arm-gcc开发STM32】开启stm32f4的FPU_物联网

编译环境

gcc：arm-none-eabi-gcc version9.3.1
stm32f4标准库：STM32F4xx_DSP_StdPeriph_Lib_V1.8.0

配置步骤

使用arm-none-eabi-gcc编译stm32f4的程序，要开启浮点运算单元FPU，只需要添加下面两个编译选项

-mfloat-abi=hard 
-mfpu=vfpv4-d16

-mfloat-abi=hard指示gcc编译器生成浮点指令
-mfpu=vfpv4-d16用于指定FPU

原理

要使用stm32f4的FPU，理论上需要做两部分的工作

开启stm32f4的浮点单元FPU
让编译器生成浮点指令

一、开启stm32f4的FPU

根据《STM32F4xx_Cortex-M4内核参考手册》FPU部分的内容，FPU在复位后默认是关闭的，设置寄存器开启。而开启FPU的代码在标准库文件system_stm32f4xx.c中的void SystemInit(void)函数中，截取代码如下

void SystemInit(void)
{
  /* FPU settings ------------------------------------------------------------*/
  #if (__FPU_PRESENT == 1) && (__FPU_USED == 1)
    SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2));  /* set CP10 and CP11 Full Access */
  #endif
  ...

可以发现要开启FPU必须要__FPU_PRESENT和__FPU_USED这两个宏定义为1。

首先看__FPU_PRESENT这个宏，它在stm32f4xx.h中定义，并且它被定义为1，表示内核具有FPU外设。那么这个宏就不用我们操心了。
然后是__FPU_USED,它在core_cm4.h中，定义它的代码如下（这里使用gcc作为编译器，因此只分析gcc部分）

#elif defined ( __GNUC__ ) 
  #if defined (__VFP_FP__) && !defined(__SOFTFP__)
    #if (__FPU_PRESENT == 1)
      #define __FPU_USED       1
    #else
      #warning "Compiler generates FPU instructions for a device without an FPU (check __FPU_PRESENT)"
      #define __FPU_USED       0
    #endif
  #else
    #define __FPU_USED         0
  #endif

分析上述代码可以得到如下逻辑：当定义了__GNUC__宏时，定义了__VFP_FP__且没有定义__SOFTFP__的情况下__FPU_USED为1，即开启stm32f4的FPU。
下面我们一个一个分析__GNUC__,__VFP_FP__,__SOFTFP__这三个宏。

__GNUC__这个宏用于表明使用什么编译器，这里使用的arm-gcc，这个宏会被gcc自动定义。
__VFP_FP__这个宏会由gcc控制是否定义，但需要提供命令给gcc。
__SOFTFP__这个宏会由gcc控制是否定义，但需要提供命令给gcc。
总结：通过上述分析可以发现，在使用gcc做编译器的情况下，标准库是否开启FPU完全由gcc控制，因此我们不需要对标准库的定义做改动，只需要控制gcc。

二、控制gcc生成浮点指令

上文中提到过
-mfloat-abi=hard -mfpu=vfpv4-d16
这两个命令用来命令gcc生成浮点指令，与此同时，gcc也会根据指令自动生成对应宏的定义。所以要使用stm32的浮点计算功能，只需要提供这两个命令即可。

三、验证

编写一个简单的程序来验证在使用
-mfloat-abi=hard -mfpu=vfpv4-d16
命令前后生成的代码以及宏定义是否符合上文的分析。

1、编写测试例程ftest.c

void test()
{
	float a=2.1,b;
	a=2.3*3.1;
	b=a*12;
	a=b/3.3;
}

2、不开启FPU

编译生成汇编代码

arm-none-eabi-gcc -S -mcpu=cortex-m4   ftest.c -o ftest.s

-S指示gcc生成汇编代码后停止
-mcpu=cortex-m4指示gcc内核架构为cortex-m4

生成的汇编代码如下

.cpu cortex-m4
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 6
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.file	"ftest.c"
	.text
	.global	__aeabi_fmul
	.global	__aeabi_f2d
	.global	__aeabi_ddiv
	.global	__aeabi_d2f
	.align	1
	.global	test
	.arch armv7e-m
	.syntax unified
	.thumb
	.thumb_func
	.fpu softvfp
	.type	test, %function
test:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 1, uses_anonymous_args = 0
	push	{r7, lr}
	sub	sp, sp, #8
	add	r7, sp, #0
	ldr	r3, .L2+8
	str	r3, [r7, #4]	@ float
	ldr	r3, .L2+12
	str	r3, [r7, #4]	@ float
	ldr	r1, .L2+16
	ldr	r0, [r7, #4]	@ float
	bl	__aeabi_fmul
	mov	r3, r0
	str	r3, [r7]	@ float
	ldr	r0, [r7]	@ float
	bl	__aeabi_f2d
	adr	r3, .L2
	ldrd	r2, [r3]
	bl	__aeabi_ddiv
	mov	r2, r0
	mov	r3, r1
	mov	r0, r2
	mov	r1, r3
	bl	__aeabi_d2f
	mov	r3, r0
	str	r3, [r7, #4]	@ float
	nop
	adds	r7, r7, #8
	mov	sp, r7
	@ sp needed
	pop	{r7, pc}
.L3:
	.align	3
.L2:
	.word	1717986918
	.word	1074423398
	.word	1074161254
	.word	1088694518
	.word	1094713344
	.size	test, .-test
	.ident	"GCC: (GNU Arm Embedded Toolchain 9-2020-q2-update) 9.3.1 20200408 (release)"

可以看到代码中并没有浮点运算指令，取而代之的是__aeabi_fmul这样的函数，这些函数用于在没有FPU的CPU上完成浮点计算。

*查看宏定义
arm-none-eabi-gcc -dM -E -mcpu=cortex-m4 ftest.c -o ftest.h

-E指示gcc在做完预处理后即可停止
-dM不做实际的预处理，仅仅列出所有#define的宏,这些宏大部分与体系结构和GNU相关，或来自所包含的头文件。

生成的文件比较长，读者可自行生成并搜寻__VFP_FP__,__SOFTFP__这两个宏，可以发现这两个宏均被定义。

3、开启FPU

同样使用前面的方法生成汇编文件和宏定义，只是这次添加开启硬浮点选项

arm-none-eabi-gcc -S -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=vfpv4-d16  ftest.c -o ftest.s
arm-none-eabi-gcc -dM -E -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=vfpv4-d16 ftest.c -o ftest.h

生成的汇编代码如下

	.cpu cortex-m4
	.eabi_attribute 28, 1
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 6
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.file	"ftest.c"
	.text
	.align	1
	.global	test
	.arch armv7e-m
	.syntax unified
	.thumb
	.thumb_func
	.fpu vfpv4-d16
	.type	test, %function
test:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 1, uses_anonymous_args = 0
	@ link register save eliminated.
	push	{r7}
	sub	sp, sp, #12
	add	r7, sp, #0
	ldr	r3, .L2+8
	str	r3, [r7, #4]	@ float
	ldr	r3, .L2+12
	str	r3, [r7, #4]	@ float
	vldr.32	s15, [r7, #4]
	vmov.f32	s14, #1.2e+1
	vmul.f32	s15, s15, s14
	vstr.32	s15, [r7]
	vldr.32	s15, [r7]
	vcvt.f64.f32	d6, s15
	vldr.64	d5, .L2
	vdiv.f64	d7, d6, d5
	vcvt.f32.f64	s15, d7
	vstr.32	s15, [r7, #4]
	nop
	adds	r7, r7, #12
	mov	sp, r7
	@ sp needed
	ldr	r7, [sp], #4
	bx	lr
.L3:
	.align	3
.L2:
	.word	1717986918
	.word	1074423398
	.word	1074161254
	.word	1088694518
	.size	test, .-test
	.ident	"GCC: (GNU Arm Embedded Toolchain 9-2020-q2-update) 9.3.1 20200408 (release)"