Latency refers to the delay or waiting time (time lag) that happens when a request or operation is initiated within an application and when a response is received. It is the time taken for data to travel from the source to the destination, including any processing time in between.
Monitoring latency helps ensure that a system or service responds quickly and doesn't keep users waiting, which is especially crucial for applications where real-time updates matter, like online gaming, online transactions, etc.
When measuring latency, the p99 refers to the 99th percentile of the latency metric. For instance, if a service is required to have a P99 latency of 100 milliseconds, it means that atleast 99% of the requests need to be completed in 100 milliseconds or less, but 1% of requests may take longer than 100 milliseconds.