springboot Use stream instead of String for Prometheus scrape endpoint

Currently PrometheusScrapeEndpoint uses StringWriter to prepare whole response before submiting it on "wire" which is causing us troubles. We would like to introduce some kind of streaming in favor of it.

FYI @jkschneider

Our issue We are monitoring quite big amout of metrics which leads in relatively big scrape responses. We would not care about it too much, but we started to face unpredictable behaviour with JVM heap allocation and G1 garbage collector. We were receiving alerts that our JVM heap usage is almost 100 % and then it suddendly dropped into normal state. Once we started to dig into it, we have noticed that our Old heap regions gets fully occupied and are full of some kind of byte/string buffers. From dump analysis we have noticed that there is some kind of reference to Prometheus. We have also noticed from GC logs that once issue started to appear the garbage collection reason was: "G1 humongous allocation" very offen. Other clue was that this issue was happening in frequent and same intervals. All of this led us to fact that scrapping, which happens frequently and in the same periods, has to generate huge heap objects - objects which are over of 50% configured region size so thats is why they are immediatally put into Old regions and marked as "humongous".

Comment From: ITman1

Just realized that this will work only for Spring MVC and not for Spring WebFlux - because of StreamingResponseBody. I cannot find any interface in Spring that would allow to return stream and still be indepentent - be dependent only on Spring Web module. I can see that there is some kind of Resource interface, but still not out of the box usable for this, this would need either to create some kind of chucked byte array outpustream and then convert it inputstream. Or involve some kind of threading, circular buffers etc. 😞 If you know about some kind of interface which we could use in favor of StreamingResponseBody then please help, otherwise reject this MR and maybe reopen it as kind of feature request. Thanks

Comment From: wilkinsona

We have a somewhat similar problem for the heap dump endpoint. In that case, we write the heap dump to a temporary file and then return a Resource that points to that file. That's one option here. Anything else would require some enhancements to the endpoint infrastructure for MVC, WebFlux, and Jersey.

Comment From: ITman1

I think that problem with heap dump endpoint is a little bit different. Heap dumps are naturally big so there is danger that they will not fit into memory so using temp files is quite fair solution. Prometheus scrape response is not so big, but is big enough that G1 is puting it into humongous heap regions. Plus heap dumps are executed usually on demand and not every 10-15 seconds like Prometheus scrape endpoint so temp files is again OK for such case. I just played with Resource interface and created this monster. What do you think about that? Is it too hacky or could we use it instead of current solution? If yes then I will update this pull request. If no then please reject this pull request and we will have fix on our side - not the spring-boot side. Thank you again.

package xyz;

import io.prometheus.client.Collector;
import io.prometheus.client.exporter.common.TextFormat;
import org.springframework.core.io.AbstractResource;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.List;

public class PrometheusScrapeResource extends AbstractResource {
    private static final int CHUNK_SIZE = 8192;

    private List<byte[]> chunks = new ArrayList<>();
    private int chunkOffset;

    PrometheusScrapeResource(Enumeration<Collector.MetricFamilySamples> mfs) {
        this.chunks.add(new byte[CHUNK_SIZE]);
        this.chunkOffset = 0;

        try (Writer writer = new OutputStreamWriter(new PrometheusScrapeOutputStream())) {
            TextFormat.write004(writer, mfs);
        } catch (IOException ex) {
            throw new IllegalStateException("Writing metrics failed", ex);
        }
    }

    @Override
    public String getDescription() {
        return "prometheus scrape resource";
    }

    @Override
    public InputStream getInputStream() {
        return new PrometheusScrapeInputStream();
    }

    private class PrometheusScrapeOutputStream extends OutputStream {
        @Override
        public void write(int b) {
            if (chunkOffset >= CHUNK_SIZE) {
                chunks.add(new byte[CHUNK_SIZE]);
                chunkOffset = 0;
            }
            chunks.get(chunks.size() - 1)[chunkOffset++] = (byte) b;
        }

        @Override
        public void write(byte[] bytes, int firstByteOffset, int bytesToWrite) {
            int remainingBytesToWrite = bytesToWrite;

            while (remainingBytesToWrite > CHUNK_SIZE - chunkOffset) {
                int freeBytesInChunk = CHUNK_SIZE - chunkOffset;

                System.arraycopy(
                  bytes, firstByteOffset + bytesToWrite - remainingBytesToWrite,
                  chunks.get(chunks.size() - 1), chunkOffset, freeBytesInChunk);

                remainingBytesToWrite -= freeBytesInChunk;
                chunks.add(new byte[CHUNK_SIZE]);
                chunkOffset = 0;
            }

            System.arraycopy(
              bytes, firstByteOffset + bytesToWrite - remainingBytesToWrite,
              chunks.get(chunks.size() - 1), chunkOffset, remainingBytesToWrite);

            chunkOffset += remainingBytesToWrite;
        }
    }

    private class PrometheusScrapeInputStream extends InputStream {
        final int totalBytes = (chunks.size() - 1) * CHUNK_SIZE + chunkOffset;
        int streamPosition = 0;

        @Override
        public int read() {
            return (totalBytes > streamPosition) ?
              (int) chunks.get(streamPosition / CHUNK_SIZE)[streamPosition++ % CHUNK_SIZE] :
              -1;
        }

        @Override
        public int read(byte[] bytes, int firstByteOffset, int bytesToRead) throws IOException {
            int remainingBytesToRead = Math.min(bytesToRead, totalBytes - streamPosition);
            int bytesAlreadyRead = 0;

            if (bytesToRead == 0) {
                return 0;
            } else if (remainingBytesToRead == 0) {
                return -1;
            }

            while (remainingBytesToRead > CHUNK_SIZE - (streamPosition % CHUNK_SIZE)) {
                int remainingBytesInChunk = CHUNK_SIZE - (streamPosition % CHUNK_SIZE);

                System.arraycopy(
                  chunks.get(streamPosition / CHUNK_SIZE), streamPosition % CHUNK_SIZE,
                  bytes, firstByteOffset + bytesAlreadyRead, remainingBytesInChunk);

                remainingBytesToRead -= remainingBytesInChunk;
                bytesAlreadyRead += remainingBytesInChunk;
                streamPosition += remainingBytesInChunk;
            }

            System.arraycopy(
              chunks.get(streamPosition / CHUNK_SIZE), streamPosition % CHUNK_SIZE,
              bytes, firstByteOffset + bytesAlreadyRead, remainingBytesToRead);

            streamPosition += remainingBytesToRead;

            return bytesAlreadyRead + remainingBytesToRead;
        }
    }
}

Comment From: wilkinsona

Thanks, @ITman1. That is a bit of a monster, isn't it. I'm afraid it feels a bit too hacky for us to adopt in Spring Boot. I'd prefer to add some streaming response support to the Actuator's endpoint infrastructure instead. I've opened https://github.com/spring-projects/spring-boot/issues/21308 to track that. Thank you anyway for your proposal here and for bringing the requirement to our attention.