lttng-tools: Backport ptest fix

Add a backport and a dependency from upstream to help address one of the lttng-tools
ptest relayd hangs we've been seeing on the autobuilder.

(From OE-Core rev: c8f845a8f391fa5f3f69a987b3977abdb4959db8)

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
This commit is contained in:
Richard Purdie
2021-12-13 22:59:23 +00:00
parent 878fbbe0e1
commit 7edb45c15c
3 changed files with 333 additions and 0 deletions

View File

@@ -0,0 +1,218 @@
Upstream-Status: Backport
From 87250ba19aec78f36e301494a03f5678fcb6fbb4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=A9mie=20Galarneau?=
<jeremie.galarneau@efficios.com>
Date: Mon, 1 Nov 2021 15:43:55 -0400
Subject: [PATCH] Fix: relayd: live: mishandled initial null trace chunk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Observed issue
==============
As reported in #1323 (https://bugs.lttng.org/issues/1323), crashes of
the relay daemon are observed when running the user space clear tests.
The crash occurs with the following stack trace:
#0 0x000055fbb861d6ae in urcu_ref_get_unless_zero (ref=0x28) at /usr/local/include/urcu/ref.h:85
#1 lttng_trace_chunk_get (chunk=0x0) at trace-chunk.c:1836
#2 0x000055fbb86051e2 in make_viewer_streams (relay_session=relay_session@entry=0x7f6ea002d540, viewer_session=<optimized out>, seek_t=seek_t@entry=LTTNG_VIEWER_SEEK_BEGINNING, nb_total=nb_total@entry=0x7f6ea9607b00, nb_unsent=nb_unsent@entry=0x7f6ea9607aec, nb_created=nb_created@entry=0x7f6ea9607ae8, closed=<optimized out>) at live.c:405
#3 0x000055fbb86061d9 in viewer_get_new_streams (conn=0x7f6e94000fc0) at live.c:1155
#4 process_control (conn=0x7f6e94000fc0, recv_hdr=0x7f6ea9607af0) at live.c:2353
#5 thread_worker (data=<optimized out>) at live.c:2515
#6 0x00007f6eae86a609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007f6eae78f293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
The race window during which this occurs seems very small as it can take
hours to reproduce this crash. However, a minimal reproducer could be
identified, as stated in the bug report.
Essentially, the same crash can be reproduced by attaching a live viewer
to a session that has seen events being produced, been stopped and been
cleared.
Cause
=====
The crash occurs as an attempt is made to take a reference to a viewer
sessions trace chunk as viewer streams are created. The crux of the
problem is that the code doesnt expect a viewer sessions trace chunk
to be NULL.
The viewer sessions current trace chunk is initially set, when a viewer
attaches to the viewer session, to a copy the corresponding
relay_sessions current trace chunk.
A live session always attempts to "catch-up" to the newest available
trace chunk. This means that when a viewer reaches the end of a trace
chunk, the viewer session may not transition to the "next" one: it jumps
to the most recent trace chunk available (the one being produced by the
relay_session). Hence, if the producer performs multiple rotations
before a viewer completes the consumption of a trace chunk, it will skip
over those "intermediary" trace chunks.
A viewer session updates its current trace chunk when:
1) new viewer streams are created,
2) a new index is requested,
3) metadata is requested.
Hence, as a general principle, the viewer session will reference the
most recent trace chunk available _even if its streams do not point to
it_. It indicates which trace chunk viewer streams should transition to
when the end of their current trace chunk is reached.
The live code properly handles transitions to a null chunk. This can be
verified by attaching a viewer to a live session, stopping the session,
clearing it (thus entering a null trace chunk), and resuming tracing.
The only issue is that the case where the first trace chunk of a viewer
session is "null" (no active trace chunk) is mishandled in two places:
1) in make_viewer_streams(), where the crash is observed,
2) in viewer_get_metadata().
Solution
========
In make_viewer_streams(), it is assumed that a viewer session will have
a non-null trace chunk whenever a rotation is not ongoing. This is
reflected by the fact that a reference is always acquired on the viewer
sessions trace chunk.
That code is one of the three places that can cause a viewer sessions
trace chunk to be updated. We still want to update the viewer session to
the most recently seen trace chunk (null, in this case). However, there
is no reference to acquire and the trace chunk to use for the creation
of the viewer stream is NULL. This is properly handled by
viewer_stream_create().
The second site to change is viewer_get_metadata() which doesnt handle
a viewer metadata stream not having an active trace chunk at all.
Thankfully, the protocol allows us to express this condition by
returning the LTTNG_VIEWER_NO_NEW_METADATA status code when a viewer
metadata stream doesnt have an open file and doesnt have a current
trace chunk.
Surprisingly, this bug didnt trigger in the case where a transition to
a null chunk occurred _after_ attaching to a viewer session.
This is because viewers will typically ask for metadata as a result of an
LTTNG_VIEWER_FLAG_NEW_METADATA reply to the GET_NEXT_INDEX command. When
a session is stopped and all data was consumed, this command returns
that no new data is available, causing the viewers to wait and ask again
later.
However, when attaching, babeltrace2 (at least, and probably babeltrace 1.x)
always asks for an initial segment of metadata before asking for an
index.
Known drawbacks
===============
None.
Fixes: #1323
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I516fca60755e6897f6b7170c12d706ef57ad61a5
---
src/bin/lttng-relayd/live.c | 47 ++++++++++++++++++++++++-----------
src/bin/lttng-relayd/stream.h | 5 ++++
2 files changed, 38 insertions(+), 14 deletions(-)
Index: lttng-tools-2.13.1/src/bin/lttng-relayd/live.c
===================================================================
--- lttng-tools-2.13.1.orig/src/bin/lttng-relayd/live.c
+++ lttng-tools-2.13.1/src/bin/lttng-relayd/live.c
@@ -384,8 +384,6 @@ static int make_viewer_streams(struct re
goto error_unlock;
}
} else {
- bool reference_acquired;
-
/*
* Transition the viewer session into the newest trace chunk available.
*/
@@ -402,11 +400,26 @@ static int make_viewer_streams(struct re
}
}
- reference_acquired = lttng_trace_chunk_get(
- viewer_session->current_trace_chunk);
- assert(reference_acquired);
- viewer_stream_trace_chunk =
- viewer_session->current_trace_chunk;
+ if (relay_stream->trace_chunk) {
+ /*
+ * If the corresponding relay
+ * stream's trace chunk is set,
+ * the viewer stream will be
+ * created under it.
+ *
+ * Note that a relay stream can
+ * have a NULL output trace
+ * chunk (for instance, after a
+ * clear against a stopped
+ * session).
+ */
+ const bool reference_acquired = lttng_trace_chunk_get(
+ viewer_session->current_trace_chunk);
+
+ assert(reference_acquired);
+ viewer_stream_trace_chunk =
+ viewer_session->current_trace_chunk;
+ }
}
viewer_stream = viewer_stream_create(
@@ -2016,8 +2029,9 @@ int viewer_get_metadata(struct relay_con
}
}
- if (conn->viewer_session->current_trace_chunk !=
- vstream->stream_file.trace_chunk) {
+ if (conn->viewer_session->current_trace_chunk &&
+ conn->viewer_session->current_trace_chunk !=
+ vstream->stream_file.trace_chunk) {
bool acquired_reference;
DBG("Viewer session and viewer stream chunk differ: "
@@ -2034,11 +2048,16 @@ int viewer_get_metadata(struct relay_con
len = vstream->stream->metadata_received - vstream->metadata_sent;
- /*
- * Either this is the first time the metadata file is read, or a
- * rotation of the corresponding relay stream has occurred.
- */
- if (!vstream->stream_file.handle && len > 0) {
+ if (!vstream->stream_file.trace_chunk) {
+ reply.status = htobe32(LTTNG_VIEWER_NO_NEW_METADATA);
+ len = 0;
+ goto send_reply;
+ } else if (vstream->stream_file.trace_chunk &&
+ !vstream->stream_file.handle && len > 0) {
+ /*
+ * Either this is the first time the metadata file is read, or a
+ * rotation of the corresponding relay stream has occurred.
+ */
struct fs_handle *fs_handle;
char file_path[LTTNG_PATH_MAX];
enum lttng_trace_chunk_status status;
Index: lttng-tools-2.13.1/src/bin/lttng-relayd/stream.h
===================================================================
--- lttng-tools-2.13.1.orig/src/bin/lttng-relayd/stream.h
+++ lttng-tools-2.13.1/src/bin/lttng-relayd/stream.h
@@ -174,6 +174,11 @@ struct relay_stream {
/*
* The trace chunk to which the file currently being produced (if any)
* belongs.
+ *
+ * Note that a relay stream can have no output trace chunk. For
+ * instance, after a session stop followed by a session clear,
+ * streams will not have an output trace chunk until the session
+ * is resumed.
*/
struct lttng_trace_chunk *trace_chunk;
LTTNG_OPTIONAL(struct relay_stream_rotation) ongoing_rotation;

View File

@@ -0,0 +1,113 @@
Upstream-Status: Backport
From 8f0646a03fbf31c19b85ec367dc2c3db56e6dbf7 Mon Sep 17 00:00:00 2001
From: Francis Deslauriers <francis.deslauriers@efficios.com>
Date: Mon, 25 Oct 2021 11:32:24 -0400
Subject: [PATCH] Typo: occurences -> occurrences
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I719e26febd639f3b047b6aa6361fc6734088e871
---
configure.ac | 2 +-
src/bin/lttng-relayd/live.c | 2 +-
src/bin/lttng-sessiond/event-notifier-error-accounting.c | 2 +-
src/bin/lttng-sessiond/ust-app.c | 2 +-
tests/utils/utils.sh | 8 ++++----
5 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/configure.ac b/configure.ac
index 12cc7a17e..27148c105 100644
--- a/configure.ac
+++ b/configure.ac
@@ -253,7 +253,7 @@ AS_IF([test "x$libtool_fixup" = "xyes"],
[
libtool_m4="$srcdir/m4/libtool.m4"
libtool_flag_pattern=".*link_all_deplibs\s*,\s*\$1\s*)"
- AC_MSG_CHECKING([for occurence(s) of link_all_deplibs = no in $libtool_m4])
+ AC_MSG_CHECKING([for occurrence(s) of link_all_deplibs = no in $libtool_m4])
libtool_flag_pattern_count=$($GREP -c "$libtool_flag_pattern\s*=\s*no" $libtool_m4)
AS_IF([test $libtool_flag_pattern_count -ne 0],
[
diff --git a/src/bin/lttng-relayd/live.c b/src/bin/lttng-relayd/live.c
index 13078026b..42b0d947e 100644
--- a/src/bin/lttng-relayd/live.c
+++ b/src/bin/lttng-relayd/live.c
@@ -2036,7 +2036,7 @@ int viewer_get_metadata(struct relay_connection *conn)
/*
* Either this is the first time the metadata file is read, or a
- * rotation of the corresponding relay stream has occured.
+ * rotation of the corresponding relay stream has occurred.
*/
if (!vstream->stream_file.handle && len > 0) {
struct fs_handle *fs_handle;
diff --git a/src/bin/lttng-sessiond/event-notifier-error-accounting.c b/src/bin/lttng-sessiond/event-notifier-error-accounting.c
index d3e3692f5..1488d801c 100644
--- a/src/bin/lttng-sessiond/event-notifier-error-accounting.c
+++ b/src/bin/lttng-sessiond/event-notifier-error-accounting.c
@@ -488,7 +488,7 @@ struct ust_error_accounting_entry *ust_error_accounting_entry_create(
lttng_ust_ctl_destroy_counter(daemon_counter);
error_create_daemon_counter:
error_shm_alloc:
- /* Error occured before per-cpu SHMs were handed-off to ustctl. */
+ /* Error occurred before per-cpu SHMs were handed-off to ustctl. */
if (cpu_counter_fds) {
for (i = 0; i < entry->nr_counter_cpu_fds; i++) {
if (cpu_counter_fds[i] < 0) {
diff --git a/src/bin/lttng-sessiond/ust-app.c b/src/bin/lttng-sessiond/ust-app.c
index b18988560..28c63e70c 100644
--- a/src/bin/lttng-sessiond/ust-app.c
+++ b/src/bin/lttng-sessiond/ust-app.c
@@ -1342,7 +1342,7 @@ static struct ust_app_event_notifier_rule *alloc_ust_app_event_notifier_rule(
case LTTNG_EVENT_RULE_GENERATE_EXCLUSIONS_STATUS_NONE:
break;
default:
- /* Error occured. */
+ /* Error occurred. */
ERR("Failed to generate exclusions from trigger while allocating an event notifier rule");
goto error_put_trigger;
}
diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
index e463e4fe3..42d99444f 100644
--- a/tests/utils/utils.sh
+++ b/tests/utils/utils.sh
@@ -1921,7 +1921,7 @@ function validate_trace
pass "Validate trace for event $i, $traced events"
else
fail "Validate trace for event $i"
- diag "Found $traced occurences of $i"
+ diag "Found $traced occurrences of $i"
fi
done
ret=$?
@@ -1949,7 +1949,7 @@ function validate_trace_count
pass "Validate trace for event $i, $traced events"
else
fail "Validate trace for event $i"
- diag "Found $traced occurences of $i"
+ diag "Found $traced occurrences of $i"
fi
cnt=$(($cnt + $traced))
done
@@ -1979,7 +1979,7 @@ function validate_trace_count_range_incl_min_excl_max
pass "Validate trace for event $i, $traced events"
else
fail "Validate trace for event $i"
- diag "Found $traced occurences of $i"
+ diag "Found $traced occurrences of $i"
fi
cnt=$(($cnt + $traced))
done
@@ -2013,7 +2013,7 @@ function validate_trace_exp()
pass "Validate trace for expression '${event_exp}', $traced events"
else
fail "Validate trace for expression '${event_exp}'"
- diag "Found $traced occurences of '${event_exp}'"
+ diag "Found $traced occurrences of '${event_exp}'"
fi
ret=$?
return $ret

View File

@@ -37,6 +37,8 @@ SRC_URI = "https://lttng.org/files/lttng-tools/lttng-tools-${PV}.tar.bz2 \
file://lttng-sessiond.service \
file://determinism.patch \
file://0001-src-common-correct-header-location.patch \
file://8f0646a03fbf31c19b85ec367dc2c3db56e6dbf7.patch \
file://87250ba19aec78f36e301494a03f5678fcb6fbb4.patch \
"
SRC_URI[sha256sum] = "cfe6df7da831fc07fd07ce46b442c2ec1074c167af73f3a1b1d2fba0c453c8b5"