I have an weird issue in production which i need to debug/fix. I use Go’s HTTP client with default transport. Locally everything works great but on production my service was using 10x CPU for similar usage (roughly 300 TLS handsahkes per second). Did some profiling GO’s pprof and found out that major CPU time spent is in a crypto library.
Production windows server:
Time: Sep 11, 2025 at 5:10pm (UTC)
Duration: 30.12s, Total samples = 50.85s (168.81%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 35370ms, 69.56% of 50850ms total
Dropped 761 nodes (■■■ <= 254.25ms)
Showing top 10 nodes out of 255
      flat  flat%   sum%        ■■■   ■■■%
   10990ms 21.61% 21.61%    11350ms 22.32%  crypto/internal/fips140/nistec/fiat.p384Mul
   10940ms 21.51% 43.13%    11160ms 21.95%  runtime.cgocall
    4510ms  8.87% 52.00%     4510ms  8.87%  runtime.stdcall2
    2790ms  5.49% 57.48%     2840ms  5.59%  crypto/internal/fips140/nistec/fiat.p384Square
    1410ms  2.77% 60.26%     1410ms  2.77%  runtime.stdcall0
    1160ms  2.28% 62.54%     1160ms  2.28%  crypto/internal/fips140/nistec/fiat.p384CmovznzU64 (inline)
    1130ms  2.22% 64.76%     1570ms  3.09%  crypto/internal/fips140/nistec/fiat.p384Add
     990ms  1.95% 66.71%      990ms  1.95%  runtime.stdcall1
     840ms  1.65% 68.36%      840ms  1.65%  crypto/internal/fips140/sha512.blockAVX2
     610ms  1.20% 69.56%      610ms  1.20%  crypto/internal/fips140/bigmod.addMulVVW2048
On local windows setup i do not see fiat library being used.
Sample code for creating HTTP client:
httpClient: &http.Client{
			Timeout: time.Duration(httpTimeoutInSeconds) * time.Second,
			Transport: &http.Transport{
				TLSClientConfig: &tls.Config{
					InsecureSkipVerify: true, // Skip certificate verification for health checks
				},
			},
		},
I have verified that the produciton server also support crypto hardware acceleration features but for some reason GO runtime fallbacks to the slower fiat library for crypto while locally it might be using WIndows CNG library.
fmt.Println("AES:", cpu.X86.HasAES)
fmt.Println("AVX2:", cpu.X86.HasAVX2)
fmt.Println("BMI2:", cpu.X86.HasBMI2)
fmt.Println("PCLMULQDQ:", cpu.X86.HasPCLMULQDQ)
Above gives true for all both locally an on production. How do i go about debugging this?