Introduction
Startup engineering is all about weighing the return on investment that various efforts will yield in order to decide where to spend your precious development time. Depending on the maturity of your startup, and complexity of your notification use cases, some parts of this guide may be more useful to you than others. This guide aims to give you the information you need to make the best actitectural choices based on your particular maturity and requirements.
What Makes A Great User Notification System?
There are three core components to a great notification system. The first is that it provides a fantastic user experience to the end user. This essentially boils down to delivering the right message, to the right user, with the right frequency, via the right channel, at the right time. A notification system that accomplishes this will delight your users and make your product more successful.
The second part to a great notification system is that it is easy to use and empowering to your internal team. For technical users that means having great developer tooling and documentation. For non-technical users that means having an intuitive design and editing interface and good visibility as well as useful data and analytics.
The third part to a great notification system is ensuring that it is well crafted from a technical perspective. That means building a system that is both reliable, scaleable, and is built with deliverability top of mind
Notification Systems Are Really Hard
The explosion in communication APIs has made the engineering work required to send a single notification fairly trivial. Need to send an email? Make a call to Mailgun. Need to send an SMS? Make a call to Twilio. Need to send a push notification? Make a call to OneSignal. This explosion in communication APIs also means that users are more inundated with notifications than ever, and that your notification user experience can be a real differentiator or detractor for your product.
So what does a great notification system look like? Luckily many startups that were founded in the cloud era, but are now large public companies with plenty of resources to perfect their notifications have shared their system architecture publicly. LinkedIn, AirBnB, Slack, and others have shared their approach to building out their product notifications, and they all come down to the same common goal: deliver the right message, through the right channel, to the right users, at the right time. Building a system that accomplishes this goal requires careful consideration around your end user experience requirements, internal user experience requirements, and your technical requirements.
End User Experience Requirements
Right User
Ensuring that your notifications are going out to the right user is table stakes. Ensuring that your user events are properly wired up with the relevant data associated with them is the most important piece to getting this right.
Example:
If a new user is added to the account of a B2B SaaS app, should the billing admin be notified? How about the other admins? You need to ensure that your events are properly wired so that this single sign-up event can trigger any relevant notifications and include the relative data to that notification.
Right Message
The user needs to be in control of ensuring they only get the notifications that they care about and through their preferred channel(s). It’s your job as the developer to provide sane defaults to make this job easy on the user.
Example:
If your application has 25 different possible notifications, don’t ask the user to toggle each one of those on/off plus set channel preferences. Rather, you should group them into logical groups and then allow the user to decide when and how they want to receive each group of notifications.
Right Frequency
Notifications that are too frequent annoy users and hurt engagement. However, missing an essential notification can potentially ruin the utility of a product for a user.
Example:
When you get five messages while you are away on Slack, Slack does not send you five notifications. Rather it batches those five notifications into a single ‘While you were away’ notification. Your system should have the ability to batch notifications in order to avoid over-notifying users and should have access to all relevant events in order to make sure your user’s don’t miss anything important.
Right Channel
The channel requirements for each product will differ depending on the user’s preferred communication methods. No matter what channels you need to support, your notification system needs the flexibility to respect your user’s preferences and deliver each type of notification to the channel of choice for each individual user. You should also design your system with the flexibility to support other channels in the future that you may not need today.
Example:
Your users may want to receive certain notifications over email only, others over email + Slack, and still others over Slack only. You should make sure this is easy to customize while providing sane defaults to start out.
Right Time
Timing is perhaps the most important aspect of delivering effective notifications. A well timed notification delights users, while a poorly timed one annoys or angers them. Your notification system needs the flexibility to adjust sends based on the time of day where your user is, the actions your user has taken, and the relevant job your user is trying to achieve.
Example:
A B2B IT management application sends an admin an email when a user requests permission to create a new Slack channel. If there is no response within 2 business days, it follows up with a Slack notification to that admin during business hours. If another 2 business days pass with no response, it sends another email to both the admin and the requester notifying them that the request has expired without being granted.
Internal User Experience Requirements
API First
The nature of application-to-human communications (aka notifications) is that they will need to be triggered from code. For that reason it’s important that as many features as possible that you offer in your system are able to be performed via an API.
Example:
While today your only notification use case may be a simple email, as your product grows more sophisticated additional notification use cases will almost certainly surface. A good API is the best way to ensure you will be able to execute on use cases you have not yet defined.
Great Documentation
It’s important that both your engineering team and non-technical users are able to effectively and efficiently use your notification system at any point in the user journey. Making sure all of the common use cases are property documented is essential to accomplishing this.
Example:
At some point in the development of your application you will come across a notification use case that you did not originally think of. Maybe a new notificaiton channel will emerge as important, or a new feature will emerge that requires a sophisticated notification flow. An API that is well documented is the best way to guarantee that your product team will figure out new ways to delight your users and deliver on use-cases that you have not yet thought of.
Data/Analytics
Your internal team needs the ability to view relevant data associated with your application’s notifications. They need to be able to answer questions around open rate, engagement rate, success rate, and other success criteria for different notifications across channels.
Example:
Notifications are an essential piece to effectively engaging your users. Having good data and analytics will allow you to invest more heavily in the channels and notifications that are working while fixing or dropping the ones that aren’t.
Robust Logs
Your team needs to ensure that deliverability is being achieved, because a message that does not make it to its destination is sure to fail in it’s goal. Logs need to not only serve the purpose of verifying deliverability, but should also make it easy to diagnose any deliverability issues quickly and efficiently.
Example:
Notifications are an essential piece to effectively engage your users. Having good data and analytics will allow you to invest more heavily in the channels and notifications that work while fixing or dropping the ones that don’t.
Editing & Designing Notifications
As engineers we would prefer if all work was done via APIs, but actually designing the look and feel for notifications across various channels is better suited for a web-based design tool. Not only does this allow for more granular control of the design, but it also allows non-technical team members to easily edit messages without needing to involve engineering.
Example:
If you have a UI to build message templates that can be properly rendered across various channels such as email, SMS, and push you will dramatically increase your teams agility (no longer need to deploy backend code to make copy changes to notifications) as well as remove a big burden from your engineering team (needing to respond to requests from marketing/content to change/update notifications).
Provider Abstraction
Abstracting the underlying email/SMS/push providers has several benefits. First it allows your developers to always interface with a consistent API, regardless of whether the underlying vendor changes at some point. Secondly, it creates the flexibility to use different providers for different use cases.
Example:
Some SMS providers are dramatically cheaper in certain regions than others and certain email providers have much better deliverability in certain regions than others. With provider abstraction you could utilize the best provider for the job rather than relying on a single provider for all your users/use cases.
Centralization
As your product and your engineering team grows, there will inevitably be different product teams with different focuses. Allowing every team to use the same notification system will not only help with your product velocity but will create a more consistent user experience for your users across the different areas of your product.
Example:
If your backend engineering team is using one system to create notifications and your growth engineering team is using another it will be nearly impossible to avoid a disjointed UX as well as tech debt associated with the different systems.
Technical Requirements
Reliability
The first thing to recognize when it comes to the reliability of a notification systems is that, by definition, you will be using third party services (Twilio, MailGun, PostMark, etc) who’s uptime and reliability you cannot control. At some point one of these services will be down and how your system handles that down time is central to it’s reliability.
There are three core factors that must be thought through from a technical perspective when designing the system for reliability.
Idempotency: you need to ensure that each notification will only be delivered once. Redundant notifications are a great way to annoy and lose users.
Fail over: What is your backup plan? Do you have a different provider you can fall back on if one is down? Or maybe an alternate delivery channel (e.g. fallback to push or SMS when email is down)? This will likely depend on each individual message type and will require some sophisticated mapping.
Timeliness: some messages are only useful if delivered within minutes, others within hours, and others within days. You need to decide, on a per message basis, at which point it actually makes more sense to drop the message then to deliver it.
Example:
Say your email notification provider went down for half an hour. Some users who requested a password reset during this time hit the request link multiple times because they never received the reset email. When your email system is back up, you don’t want to send them a bunch of redundant password reset emails, you only want to send them the most recent one and drop the others.
Deliverability
When it comes to notification systems, deliverability is essential. A message that does not make it to its intended destination will almost certainly fail in achieving its goal and can have devastating consequences for the customer. Optimizing and tracking deliverability differ dramatically by channel.
Channel | Optimization | Tracking |
- SPF, DKIM, and DMARC need to be in place to avoid spam filters - Cross-client email testing/preview | - Can track open rate and click through - Deliverability data is dependent on the email server you are sending to - Constantly shifting landscape, what data is available is changing all the time | |
SMS | - If you are sending at scale you need to use multiple numbers to ensure you’re not blocked by the provider - Regional pricing varies greatly, vendor support for various regions varies as well | - Carrier may silently block your delivery so tracking is tricky, tracking click through on links is your best bet - It’s a good idea to build your own domain tracking, otherwise your deliverability can be affected due to the URL shortening services like Bitly |
Mobile Push | - Typically as your volume of mobile push notifications goes up deliverability will go down as users will revoke push permissions - Best optimization tactic is to make each notification as useful and timely as possible - You can optimize the process of asking for permission but not actual deliverability - iOS is introducing message batching in Fall 2021 which will change the UX of iPhone push notifications (based on ML) | - You can track engagement as a proxy but there is no way to track real deliverability |
In App | - Basic web development user experience, ensure you test on different devices and browsers - Many 3rd party notification services are blocked by ad blocking services | - Can track the same way you’d track any other engagement on your website |
Slack/Chat | - Authentication and permissioning are complex, it’s best to adhere to that particular chat platforms best practices to ensure deliverability | - Varies by platform - Slack provides data about if use was online when notification was delivered, click throughs w/hyperlinks or webhook enabled buttons (this requires server side tracking infrastructure) |
Scalabilty
Very few products have a linear message volume, more likely you will have very spiky times paired with relatively quiet times. This means that queueing and processing messages will be especially important. Below are some best practices to keep in mind when designing your system for scalability.
Ensure your queue infrastructure can handle your expected maximum volume (this will prevent a lot of other downstream issues)
Latency will be determined by how much time passes between the event that triggers the event and when that event is processed to deliver the message. The listeners that are processing your queue will be the primary bottleneck here though your downstream provider will also constrain your throughput/latency.
Windowing is essential to ensure you are taking advantage of the allotted volume your provider allows without going over, which can result in dropped messages. One strategy for increasing the throughput limitations in place from providers is to have parallel provider accounts or parallel providers. For example, if you use one Twilio account for password resets and a separate Twilio account for transactional notifications you can ensure that if you exceed your limits for transactional notifications it will not affect the deliverability of password resets. Overall the more providers and channels you have at your disposal the less likely you are to saturate your deliverability windows.