Ross Sandford

Wednesday 25 May 2022

Correlating traces and logs with Elastic Cloud

Although application logging is of great value, it tends to only give a small window into the performance of services. It would also be great to know about the services' performance, dependencies and other key metrics in one view.

I have been working with Elastic Cloud to achieve this, using a .NET 6 web API example solution. This solution uses an Azure Event Hub to store logs, which Elastic Cloud pulls from. I'm also sending trace data separately, and ensuring the trace.id and transaction.id fields exist on every log so we can correlate traces with logs to provide a single view of transactions in Elastic Cloud APM. It is the trace.id and transaction.id fields that are the key to this.

Nuget packages required

Elastic.Apm.NetCoreAll - gives us auto instrumentation of trace data into Elastic Cloud's APM server.

Microsoft.Azure.EventHubs (deprecated, but a dependency to the UKHO package) - for connecting to and working with Azure Event Hubs.

UKHO.Logging.EventHubLogProvider - provides a logging sink for Microsoft.Extensions.Logging.Abstractions. Logs are sent to Azure Event Hub as a JSON message. Using this so my example would be relevant to where I work!

Configuration values

I'm using Azure Key Vault and an appsettings.json config file in this PoC to pull in configuration settings required in this example app.

Configuring the logging set up - Program.cs

This line specifies we want to send trace information to Elastic Cloud's APM server. That's all we need to register the service in Elastic APM and record trace information against each method:

Implementation

Elastic APM relies on transaction.id and trace.id fields existing on log messages to correlate trace data with logs.

As we are not automatically sending trace.id and transaction.id values with each of our logs, we have to use manual log correlation to add them as properties on our log messages.

As the below example shows, we can use the Elastic.Apm.Agent.Tracer.CurrentTransaction property (from the Elastic.APM Nuget mentioned before) anywhere in code to access the currently active transaction, and add the trace.id and transaction.id properties to our logs:

Now as part of the JSON that makes up the "message" field on our log document, we will have trace.id and transaction.id:

However, before we can correlate these logs to our APM traces, we have to do further processing in Elastic using Filebeat's processors.

Configure Event Hub integration in Fleet

We will assume the integration to pull logs from an Azure Event Hub is already set up in Elastic Cloud via Fleet and Elastic Agent, as described here. We need to add two processors to this configuration to pull out the trace.id and transaction.id values into dedicated fields. These are set under the advanced options for the integration, using YAML.

decode_json_fields

This is the first processor we will use. It decodes fields containing JSON strings and replaces the strings with valid JSON objects. Add this to the Processors field under advanced settings for the Event Hub integration as follows:

So we're instructing the decode_json_fields processor to decode the message field and replace the strings with valid JSON objects. We could add a new field name to the "target" option, but we're leaving this blank so the processor decodes and updates the message field.

rename

Our message field now has valid JSON objects, but given how nested transaction.id and trace.id were inside the message field, they are named as:

- Properties.transaction.id

- Properties.trace.id

We can use the rename processor to rename trace.id and transaction.id to what we want.

That's all the processors we need, so save the configuration changes to the agent policy.

View in Kibana

Now in Kibana when we view a log that has been ingested after our Event Hub integration config changes, we will see the trace.id and transaction.id pulled out into dedicated fields:

This means that Elastic APM can now correlate APM traces with our logs.

View in APM

When we view a transaction that contains logging in APM, we can now see all the trace information and the log messages in one view :)

Logging middleware

As a further demonstration, this project also contains logging middleware to log each request made against the API, where we are adding the transaction.id and trace.id to the log message each time:

Saturday 30 January 2021

It is mine! Oh yes, it is mine.

As soon as I started playing the guitar at aged 16, I was aware that there was one model of guitar that held a certain mystique; a place at the top of the rankings for all six string wielders to seek.

This is the Gibson Les Paul. Played by such pioneers as Jimmy Page, Slash, Billy Gibbons, it is seen as the pinnacle of the electric guitar. You can find other links to furnish your no doubt burning desire to know more about the history of the instrument - I'm just delighted to have joined the club of owners, and to be able to experience the top craftsmanship of this fine guitar.

If you are interested, mine is a 2005 cherry sunburst model, and it sounds fantastic 👍😍

Friday 11 September 2020

Delegate to build trust and investment

As a lead, it can be all too easy to want to do as much as possible yourself, feeling that unless you are constantly going above and beyond or demonstrating your worth at all times, you must be failing.

It is common to see a new lead take on much responsibility to prove to themselves and their team that they are the correct person for the role.

There are several pitfalls to this, aside from the obvious stress and potential burnout that could arise.

Ask yourself: How is this approach benefitting the team?

The short-term view would be something like: “I’m helping the team by taking less valuable work off them, so they can focus on achieving the valuable stuff”. Well alright, but a red flag is raised here – you have decided the value of the work, without consulting the team. Don’t assume what you consider low value work (such as configuring that nasty third party application that no one else wants to touch) is going to be also considered low value by ever member of your team.

The important thing here is to give the team the opportunity to either volunteer to pick up that work or agree indeed that is low value. This does several things:

Demonstrates that you respect your team members opinions
Gives the opportunity for team members to take on work that they otherwise wouldn’t have had the chance to
Prevents the view that as lead you are deciding who works on what

Delegate even more!

We can build on those 3 benefits further by identifying more opportunities for delegation. Research around this will surely uncover quotes advising to “Delegate until it is uncomfortable”. Why?

As leaders we have an opportunity to bolster team members’ confidence and self-worth by giving them work. This can be a bit unnatural as a lead may naturally want to know more than anyone else, seeing that as part of their role. Ultimately, this will hold team members back in several ways. A lead should empower the team and instil trust. By delegating work to a team member, you are saying “I trust you to go and get this done”, but it also leads to something more, which is building investment.

Build investment

Once you ask the team to go and do something, they are immediately invested in that area of work. As an example, we’ll say we’ve delegated how to implement a logging solution in a new project. What would happen if the lead did this instead?

The lead would know more about the logging solution than anyone else
The lead would be handing down the solution, rather than the team discovering it
Chance of “The lead always gets to decide everything” attitude

We want the team to decide on the best way forward, not a single person. Now going forward, the team is the authority on that logging solution, not just the lead. All expertise around it that future changes may rely on is now sitting with the team, and not a single person. Additionally, the team feels empowered about it, and that they own it and want it to succeed.

If it does end up that one person completes a piece of work that the team would benefit from the knowledge of, handover sessions should be actively encouraged.

The team should own and be invested in everything as much as possible. It should not just be up to the lead.

If you can achieve this, you'll quickly see colleagues be more keen to help if things go wrong, and also become a lot more involved in decision making. This is good for everyone; use all the brains, sure, but you'll also help to foster a more open, inclusive team where trust abounds.

Sunday 3 February 2019

Interviewing - my approach, and what I get from doing them

Introduction

I have been on the interview panel for technical roles a lot down the years, and recently have been interviewing for lead developers. This process always invokes a whole range of thoughts for me so wanted to explore them further with this post.

What I do

You always hear of the importance of first impressions when it comes to interviews. As the interviewer, I try to greet them warmly, look them in the eye and smile. I want them to be at ease as much as they can be; you'll only see the best of someone if they are relaxed enough to answer and demonstrate as they want to.
I make sure I am engaged at all times and listen intently to their answers. I encourage with positive speech and never react to an answer as if they are completely wrong - even if they are. I don't want to them to panic and lose their composure, as this will result in rushed, less detailed answers.
You often hear of the good cop/bad cop set up of an interview panel, and I am definitely the good cop - I couldn't do it any other way.
I will occasionally lead the candidate to an answer if I feel they are most of the way there anyway, as this builds rapport. Again, I want them to feel at ease as much as possible, so if they align with me more than the other members of the panel, that is OK - especially if the candidate has applied for a role in my team.

The benefits to me

I always come away from interviews feeling a little lucky and to a certain extent verified. The reasons for this are usually because of where the candidate is coming from, both organisationally and personally. At their organisation, candidates are usually forced to maintain the status quo rather than enjoy up to date technology and practices. For example, a candidate may state that they have tried to bring TDD into their team, only for their tech lead to state "that takes more time than it is worth", or they've pleaded to bring in a source control system other than Source Safe to no avail. How else can I feel when faced with these examples, other than sympathetic and fortunate?
Personally, then, the candidate is frustrated and desperate to find a way out. The purpose of the interview in this case then includes trying to pick out a diamond in the rough - generally a developer that is putting in work to improve themselves, despite being professionally hindered. Such work would include personal projects such as web sites or apps, but also personal blogs and evidence that they follow industry resources and training videos or courses.
For me, I feel lucky then to be able to be in the position of keeping my team up to date with newer technologies and giving opportunity to those who which to pursue it. I'm also pleased with the best practices we have in place and am validated on this by candidates wishing to utilise the very same.
So learning from this, I can take away a pointer as how to be a good lead. That is to encourage my team to go away and bring back best practices they wish to explore and new technologies that may solve a problem for us. I believe my team feel they can come to me with either and be given a fair shot. It is important for developers to be given room to do so.

Makes you realise what is important

In a lead role it is difficult to stay in the weeds of the codebase, but this isn't an absolutely vital aspect for me. As a lead you want to be operating at a higher level - understanding a changes' impact to the wider solution and ensuring approaches are discussed and planned before being executed. It can feel unsettling to be away from the code sometimes, but a good team with good devs you can trust makes this easier! You have to find a balance between knowing the codebase well enough so you can advise of a potential change and not knowing every line.
How do interviews fit into this? For me, they help me re-centre on what is important in my role. From candidates we learn what they want to be doing and how they want to work. You are getting a free insight into the mind of a dev outside of your organisation in terms of what they are looking for in a team and from a lead. I then try to facilitate these conditions for my team. A happy dev is someone who is listened to and respected. Make all opinions welcome and valuable.

Lead role - the importance of technical skill

How important is technical ability when interviewing for a lead role? It's important. Experience is the key here really. A good lead will have faced a vast array of different technical challenges in their career. Therefore, I'd say a lead needs 5 years + experience in Software Development before stepping up. I must also stress that in the interview you must get confidence in the candidate's technical ability. I deal in .NET, so have low level questions to tease out that knowledge. I also like to see enthusiasm for for the field and evidence of the candidate staying up to date. How can they encourage their team to go looking for ideas if they have no interest themselves? If the technical ability is lacking, the applicant must be unsuccessful. They will not have the necessary respect of the developers if they can't remember the L in SOLID 😉

Successful candidates

Successful candidates that join my team remain in my team. I believe this is because I create an environment that is open, encouraging and professional. I want to see my team approaching me with ideas on new technology and best practice. I also want to feel challenged by them to improve my understanding of a certain area. I give them room to improve themselves and try things. Keep it failsafe!
I strive to get the balance right between doing what will keep my team happy over what the organisation wants. Creative freedom over bureaucratic governance. Trying new things over keeping it safe and dull. Filling out pull requests over filling out forms!

tl;dr

Keep the candidate at ease, build rapport
Learn from the candidate as to what they want from a team and a lead
In a lead role, technical skill is important, as is enthusiasm for the field
Within your team, engender an open, failsafe and encouraging environment

Friday 12 January 2018

Sprint Planning - Velocity Planning?

Towards the end of our sprint planning session today we were in a position of committing to just over our current velocity. Our Scrum Master rightly wanted to check if the team felt comfortable with doing that, and was met with a resounding No. We have recently lost a developer from the team, and therefore felt uneasy about committing to more than, or equal to, our current velocity.

The stance back from our Product Owner was however that we should be committing to velocity, as recent training courses they had attended stated this was the thing to do, and if we don't do this, what is the point in having that statistic.

This led to a tricky stand off between the PO and the development team.

Our PO was absolutely of the mind that we should be planning to velocity. Of course, if we hadn't lost a developer then I'm sure this would have been accepted, but the development team was stating their unease in doing this, and wanted to commit to slightly lower than velocity, given the reduced capacity and the high complexity of the user stories in scope. Plus, we had a 3 day time boxed spike and 2 bugs (not included in our velocity) to contend with.

This has made me want to learn more about velocity planning, and has led me to two points:

Should you use velocity to plan a sprint if the team has decreased in size, and
Velocity v capacity planning

So, point 1. From what I have read, velocity planning is an excellent gauge of what a team can achieve in a sprint, if that team is mature and has a constant team size. As Mountain Goat software states:

Velocity-driven sprint planning is based on the premise that the amount of work a team will do in the coming sprint is roughly equal to what they’ve done in prior sprints. This is a very valid premise.
This does assume, of course, things such as a constant team size, the team is working on similar work from sprint to sprint, consistent sprint lengths, and so on.

And as this article on Scrum Alliance states:

Velocity relies on team consistency in order to be most valuable. If your agile team changes, use common sense in planning future iterations. If 20% of your team is unavailable for a couple iterations, then reduce planned velocity by 20% or so.

So these posts indicate the development team was correct to push back on using velocity planning in this instance. I can certainly see the benefit of using velocity planning in the future, but only once the team has re-stabilised and gone through several sprints (4-5 is the recommended).

Onto point 2, velocity v capacity planning.

Let's get the easy bit out of the way: you can only implement velocity planning if you have a team that has been around long enough to do that with. A worry, though, with velocity planning is that the development team can become "story point accountants". Capacity planning makes sense as you can build your sprint to who is available, but doesn't factor in complexity as well.

This article on Software Engineering Stack Exchange tackles the velocity v capacity planning question. I like the takeaway:

My advice would be to use your story-point velocity that you have established to determine which stories to take on, and then build out the true capacity for the upcoming sprint based on resource availability.

That makes sense, right?

Aside from the two points above, one angle that I hadn't considered was the Sprint Goal. If this had been set up front in the planning session, perhaps this would have given our PO more buy in into over committing, as we may have included the story that we moved out of the sprint if it was an integral part of the goal?

Oh, and in case you wanted to know what happened in our case, the agreement was made to commit to just under velocity, with the next story to pull in agreed to be the one that was not committed to. So the development team's view was taken over the PO.

From my research as illustrated above, I'm pleased that it was.

Saturday 2 December 2017

DevOps at Sprint Planning

Today marked a real success for the team as we start to make our way to a more DevOps way of working. Our "DevOps Resource" (read representative of ISS / Sys Admin), sat in with us at our sprint planning meeting.

This allows Shaun to gain the context of each user story that we were going to commit to and promotes buy in, involvement and feeling a part of what the development team is doing.

Importantly, as this is sprint planning, Shaun has an upfront view of these stories, so we will avoid the usual rigmarole of emailing / Skyping as and when we need ISS involvement and the inevitable frustration that arises when a) we can't get hold of them, or b) they do not action our request quickly enough.

As we embark on the latest sprint, we're seeing these ISS tasks either get completed by the first day, or soon after. This has resulted in an improved relationship between the two parties and I believe a sense of empowerment for Shaun.

Long may that continue!

Wednesday 29 November 2017

Post #1

First post

Here's the very first post to my new blog. I am a lead engineer, and aim to post my thoughts and adventures from my day to day working life.

Check back soon to see what's been going on!