I have a user activity log and query system for an ISP with very high intake of log events (5k-10K /second). Each NAT session log needs to be related to Radius/Session based on a common InternalIP field. In both log each session has two events Start and Stop. A roughly 24 hr data produced by 3000 users, is like 20 Million records, expected to go up.
My solution consists of 2 parsing and persisting agents for each log type, written in Golang with Postgresql db backend. I am experiencing several issues on both sides. The parsing and storage can't keep up with the high data rate even after (in-memory) buffering syslog events. For preserving space I have to group each session into one record and identify a NAT session user from other log, implemented through a trigger for now. Buffers take up system ram and eventually the process is killed. Writing to PGSQL is slow due to user identification and indexes on the table.
In order to re-visit my approach I wanted to look for suggestions on how to improve performance. No matter what approach I take, I need to identify the NAT user from Radius Session logs before persisting this data to the database.