Suppose the LLM will generate the whole text like:
Larson graduated from Webster^[1][2]^ and Tooele^[3]^...
A possible output sequence my program will receive maybe
- Larson
- graduated from
- Webster^
- [
- 1][2
- ]^ and
- Tooele
- ^[
- 3
- ]^
- …
I want to filter all the citations as soon as getting string against the regex \^(\[\d+])+\^
The process is as follows:
The final output:
Larson graduated from Webster and Tooele...
The question is how to partially match string with regex like Webster^
,Webster^1][2
?
Notice:
Don’t manually enumerate all the regular expressions like
\^$|\^\[$|\^\[\d+$|\^(\[\d+])+$|\^(\[\d+])*\[$|\^(\[\d+])*\[\d+$
It is error-prone and difficult to maintain.