Building Cloud-Native Payment Systems: Lessons Learned
After years working on high-throughput payment microservices at BBVA, here are the key architecture decisions that made the difference between scalable and fragile systems.
Building Cloud-Native Payment Systems: Lessons Learned
Working on payment systems at scale forces you to think differently about reliability, consistency, and failure modes. After several years building and maintaining cloud-native payment infrastructure at BBVA Technology, I want to share some of the hard-won lessons.
The Fallacy of Synchronous Everything
When I first started in payments, every service call was synchronous. A payment request would fan out to fraud detection, balance checks, authorization networks — all in a chain. The problem? A 200ms timeout in any single service cascades to a 2-second failure for the customer.
The shift to event-driven architecture changed everything. Key insight: not every step in a payment flow needs to happen before you confirm to the customer. Fraud scoring, ledger updates, notification dispatch — many of these can be asynchronous.
// Before: synchronous chain
PaymentResult result = fraudService.check(payment)
.then(balanceService::validate)
.then(authNetwork::authorize)
.then(ledgerService::record);
// After: command pattern with async steps
commandBus.dispatch(new InitiatePaymentCommand(payment));
// Return immediate acknowledgment, process async
Idempotency is Non-Negotiable
Networks fail. Retries happen. Without idempotency keys, you double-charge customers. Every payment endpoint must be idempotent — the same request, submitted multiple times, must produce the same result.
We store idempotency keys in Redis with a TTL of 24 hours. The implementation is straightforward but the discipline to apply it consistently is not.
Circuit Breakers Save Your Night
Third-party authorization networks go down. When they do, without circuit breakers, your entire payment service backs up with pending requests and thread pools exhaust.
Using Resilience4j with Spring Boot:
@CircuitBreaker(name = "authNetwork", fallbackMethod = "authNetworkFallback")
public AuthResult authorize(Payment payment) {
return authorizationNetworkClient.authorize(payment);
}
public AuthResult authNetworkFallback(Payment payment, Exception ex) {
// Queue for async retry or use secondary network
paymentQueue.enqueue(payment);
return AuthResult.pending();
}
What’s Next
In future posts I’ll cover distributed tracing with OpenTelemetry, our approach to saga orchestration for multi-step transactions, and how we use Elasticsearch for payment analytics at scale.
The world of payment systems is endlessly fascinating — high stakes, strict correctness requirements, and enormous scale. I hope these lessons are useful if you’re building in this space.