用 go 开发 Telegram bot 转发消息的功能时,碰到了 HTML 字符串解析的问题,这个问题总结下来是这样的:转发给 Telegram API 的消息通过 HTML parse mode 解析之后转发到 Channel,其中需要在作者和内容之间插入换行符,由于 HTML attribute 中有双引号所以我对消息内容采用了反引号来拼接这段字符串,如下所示:
memoURL := "https://twitter.edony.ink/m/" + fmt.Sprint(memoResponse.ID)
contentGroup := `<a href="` +
memoURL +
`">@memos</a> says: <pre>\n</pre>` +
memoResponse.Content +
`</pre>`
测试的结果是 \n
被原封不动的被打印出来,经过测试发现是字符串转义的问题导致。这篇文章总结一下 go 表达字符串的三种方式以及它们的差异。
go 没有单引导表示的字符串,单引号用来表示字符即 byte 类型或 rune 类型,对应 uint8 和 int32 类型,默认是 rune 类型。byte用来强调数据是 raw data 而不是数字,而 rune 用来表示 Unicode 的 code point。 举例:
// https://github.com/golang/go/blob/release-branch.go1.20/src/runtime/string_test.go#L309
func TestIntString(t *testing.T) {
// Non-escaping result of intstring.
s := ""
for i := rune(0); i < 4; i++ {
s += string(i+'0') + string(i+'0'+1)
}
if want := "01122334"; s != want {
t.Fatalf("want '%v', got '%v'", want, s)
}
// Escaping result of intstring.
var a [4]string
for i := rune(0); i < 4; i++ {
a[i] = string(i + '0')
}
s = a[0] + a[1] + a[2] + a[3]
if want := "0123"; s != want {
t.Fatalf("want '%v', got '%v'", want, s)
}
}
双引号表示的字符串就是通常意义上的字符串,实际上是字符数组(指针)可以用索引号访问某字节,也可以用len()
函数来获取字符串所占的字节长度,碰到转义字符会进行转义处理。底层实现如下:
//https://github.com/golang/go/blob/release-branch.go1.20/src/runtime/string.go#L232
type stringStruct struct {
str unsafe.Pointer
len int
}
func stringStructOf(sp *string) *stringStruct {
return (*stringStruct)(unsafe.Pointer(sp))
}
// This is exported via linkname to assembly in syscall (for Plan9).
//
//go:linkname gostring
func gostring(p *byte) string {
l := findnull(p)
if l == 0 {
return ""
}
s, b := rawstring(l)
memmove(unsafe.Pointer(&b[0]), unsafe.Pointer(p), uintptr(l))
return s
}
表示字符串字面量,但不支持任何转义序列。字面量 raw literal string 的意思是,你定义时写的什么,它就是什么样的,你有换行它就换行,你有 \n
它就显示 \n
。底层实现如下:
// rawstring allocates storage for a new string. The returned
// string and byte slice both refer to the same storage.
// The storage is not zeroed. Callers should use
// b to set the string contents and then drop b.
func rawstring(size int) (s string, b []byte) {
p := mallocgc(uintptr(size), nil, false)
return unsafe.String((*byte)(p), size), unsafe.Slice((*byte)(p), size)
}
// rawbyteslice allocates a new byte slice. The byte slice is not zeroed.
func rawbyteslice(size int) (b []byte) {
cap := roundupsize(uintptr(size))
p := mallocgc(cap, nil, false)
if cap != uintptr(size) {
memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size))
}
*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(cap)}
return
}
// rawruneslice allocates a new rune slice. The rune slice is not zeroed.
func rawruneslice(size int) (b []rune) {
if uintptr(size) > maxAlloc/4 {
throw("out of memory")
}
mem := roundupsize(uintptr(size) * 4)
p := mallocgc(mem, nil, false)
if mem != uintptr(size)*4 {
memclrNoHeapPointers(add(p, uintptr(size)*4), mem-uintptr(size)*4)
}
*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(mem / 4)}
return
}