Oct 31, 2018 - Finding and Fixing a Bug in JRuby

In a Rails application I work on we have a class in that provides a kind of null/nil value. It can be used in both the context of arithmetic operations, and in a string context. It is pretty similar to the example below:

With this we can do things like the following without needing to check for nil values:

irb(main):010:0> values = [BigDecimal('12.3'), BigDecimal('45.6'), NullValue.new]
irb(main):011:0> values.reduce(:+).to_f
=> 58.9

The way the coerce method is implemented, it should work with any operation a Numeric supports.

However, it turned out that we’d never used this NullValue in subtraction previously. When a developer in the team introduced code using this operation, it worked just fine using MRI/CRuby, however when it was deployed to our continuous deployment server that used JRuby, it suddenly failed.

I decided to track down what was going on.

The issue was easy to reproduce:

irb(main):001:0> RUBY_VERSION
=> "2.5.0"
irb(main):002:0> JRUBY_VERSION
=> "9.2.0.0"
irb(main):003:0> load './null_value.rb'
=> true
irb(main):004:0> BigDecimal(1) - NullValue.new
Traceback (most recent call last):
        7: from /Users/seanm/.rbenv/versions/jruby-9.2.0.0/bin/jirb:13:in `<main>'
        6: from org/jruby/RubyKernel.java:1180:in `catch'
        5: from org/jruby/RubyKernel.java:1180:in `catch'
        4: from org/jruby/RubyKernel.java:1418:in `loop'
        3: from org/jruby/RubyKernel.java:1037:in `eval'
        2: from (irb):4:in `<eval>'
        1: from org/jruby/ext/bigdecimal/RubyBigDecimal.java:1134:in `-'
TypeError (NullValue can't be coerced into BigDecimal)

I tried a few different versions of JRuby to check if it was a recent regression, however those versions (9.1.17.0 and 1.7.27) had the same issue.

The stack trace gives a place to start. Examining line 1134 of org/jruby/ext/bigdecimal/RubyBigDecimal.java

@JRubyMethod(name = "-", required = 1)
public IRubyObject op_minus(ThreadContext context, IRubyObject b) {
    return subInternal(context, getVpValueWithPrec(context, b, true), b, 0);
}

left me scratching my head a bit. Without a bit of context nothing stands out. I looked at the implementation of subInternal()

private IRubyObject subInternal(ThreadContext context, RubyBigDecimal val, IRubyObject b, int prec) {
    if (val == null) return callCoerced(context, sites(context).op_minus, b);
    RubyBigDecimal res = subSpecialCases(context, val);
    return res != null ? res : new RubyBigDecimal(context.runtime, value.subtract(val.value)).setResult(prec);
}

That line containing callCoerced() looked like the path to investigate. It is only called if val is null. From the calling location in op_minus, that val is the result of getVpValueWithPrec(context, b, true).

To try to get some context of how this is used, I looked for similar calls to getVpValueWithPrec. There’s a method op_plus which appears to be similar to op_minus.

@JRubyMethod(name = "+")
public IRubyObject op_plus(ThreadContext context, IRubyObject b) {
    return addInternal(context, getVpValueWithPrec(context, b, false), b, vpPrecLimit(context.runtime));
}

The difference in this case is that the third parameter to getVpValueWithPrec is false instead of true. Time to see what that does.

private RubyBigDecimal getVpValueWithPrec(ThreadContext context, IRubyObject value, boolean must) {
    if (value instanceof RubyFloat) {
        // Unimportant details omitted
    }
    if (value instanceof RubyRational) {
        // Unimportant details omitted
    }

    return getVpValue(context, value, must);
}

At this point we see the third parameter is named must. I know that our value isn’t a RubyFloat or RubyRational (it’s a NullValue), so this method falls through to call getVpValue, passing through the must value.

private static RubyBigDecimal getVpValue(ThreadContext context, IRubyObject value, boolean must) {
    switch (((RubyBasicObject) value).getNativeClassIndex()) {
        case BIGDECIMAL:
            ...
        case FIXNUM:
            ...
        // Unimportant details omitted
    }
    return cannotBeCoerced(context, value, must);
}

Nearly there. Again it falls through the other type checks and ends up at the call cannotBeCoerced with the must value passed in.

private static RubyBigDecimal cannotBeCoerced(ThreadContext context, IRubyObject value, boolean must) {
    if (must) {
        throw context.runtime.newTypeError(
            errMessageType(context, value) + " can't be coerced into BigDecimal"
        );
    }
    return null;
}

This method contains the text of the error message received from the failed subtraction that started this investigation. So if must is true, then the error is raised. Recall that the op_plus has the must value as false.

Returning all the way back up the call stack to op_minus, we see that if must is changed to false, then getVpValue and subsequently getVpValueWithPrec will return null. Thus the val passed to subInternal will be null.

private IRubyObject subInternal(ThreadContext context, RubyBigDecimal val, IRubyObject b, int prec) {
    if (val == null) return callCoerced(context, sites(context).op_minus, b);
    // Unimportant details omitted
}

With val being null, the callCoerced method would actually be called in the first line of subInternal, instead of an exception being raised.

The fix is to change that must value, as seen in this diff:

diff --git a/core/src/main/java/org/jruby/ext/bigdecimal/RubyBigDecimal.java b/core/src/main/java/org/jruby/ext/bigdecimal/RubyBigDecimal.java
index 12b7788b2d..5c3373b7ee 100644
--- a/core/src/main/java/org/jruby/ext/bigdecimal/RubyBigDecimal.java
+++ b/core/src/main/java/org/jruby/ext/bigdecimal/RubyBigDecimal.java
@@ -1131,7 +1131,7 @@ public class RubyBigDecimal extends RubyNumeric {

     @JRubyMethod(name = "-", required = 1)
     public IRubyObject op_minus(ThreadContext context, IRubyObject b) {
-        return subInternal(context, getVpValueWithPrec(context, b, true), b, 0);
+        return subInternal(context, getVpValueWithPrec(context, b, false), b, 0);
     }

That addresses the exception that was being raised.

However, while I was looking around trying to understand how callCoerced worked, I noticed that there was a difference in the way it is called by methods similar to subInternal.

subInternal calls it as follows:

return callCoerced(context, sites(context).op_minus, b);

whereas addInternal calls it with four parameters:

return callCoerced(context, sites(context).op_plus, b, true);

Most of the other invokations of callCoerced used the 4 parameter version, with that last parameter set to true.

The implementation of the 3 parameter form of callCoerced method is:

public final IRubyObject callCoerced(ThreadContext context, CallSite site, IRubyObject other) {
    return callCoerced(context, site, other, false);
}

Of interest is how it calls the 4 parameter version of the method with a value of false. So I decided to check what that does.

The 4 parameter version of callCoerced looks like:

protected final IRubyObject callCoerced(ThreadContext context, CallSite site, IRubyObject other, boolean err) {
    IRubyObject[] args = getCoerced(context, other, err);
    if (args == null) return context.nil;
    IRubyObject car = args[0];
    return site.call(context, car, car, args[1]);
}

That 4th parameter is err and that gets passed to getCoerced. The interesting part is below:

    try {
        result = sites(context).coerce.call(context, other, other, this);
    }
    catch (RaiseException e) { // e.g. NoMethodError: undefined method `coerce'

        if (error) {
            throw runtime.newTypeError(str(runtime, types(runtime, other.getMetaClass()), " can't be coerced into ", types(runtime, getMetaClass())));
        }
        return null;
    }

If there is no coerce method provided, an exception is raised. This exception is swallowed unless the error parameter (which is the err parameter from callCoerced) is true.

In the case of the NullValue example from the beginning of this post we can remove the coerce method (and the method_missing one) to see what happens.

Firstly addition:

irb(main):001:0> load './null_value.rb'
=> true
irb(main):002:0> BigDecimal(1) + NullValue.new
Traceback (most recent call last):
        7: from bin/jirb:12:in `<main>'
        6: from org/jruby/RubyKernel.java:1181:in `catch'
        5: from org/jruby/RubyKernel.java:1181:in `catch'
        4: from org/jruby/RubyKernel.java:1415:in `loop'
        3: from org/jruby/RubyKernel.java:1043:in `eval'
        2: from (irb):2:in `evaluate'
        1: from org/jruby/ext/bigdecimal/RubyBigDecimal.java:1039:in `+'
TypeError (NullValue can't be coerced into BigDecimal)

This fails with an exception as expected, because there is no coerce method provided.

However, looking at subtraction:

irb(main):003:0> BigDecimal(1) - NullValue.new
=> nil

This fails quietly.

To make it consistent, and correct, the invokation of callCoerced in subInternal needs to change to the 4 parameter version with an err value of true, just like addInternal.

The full PR for these fixes can be found at https://github.com/jruby/jruby/pull/5391.

As a final thought, it’s often interesting how much effort can go into what ultimately is a tiny code change.

Aug 20, 2018 - Parsing and Comparing Version Numbers

A recent bug hunt led me to discover the issue was caused by some well-meaning attempts at backward compatibility, but a not-so-great choice of how to evaluate the version of Ruby in use

unless RUBY_VERSION =~ /^1\.9/
  # supply a method not available before Ruby 1.9.x via monkey-patch
end

So as Ruby 2.x became prevalent, the above code failed the check and supplied a method that was actually incompatible with Ruby versions >= 1.9.x.

I’ve also seen versions treated as decimal numbers and then compared, e.g.

this_version = Float('1.9')
other_version = Float(other_version_string)

if this_version > other_version
  # ...
end

That seems like it should work but it has two failings.

If we think of the first number as the “major version” number, and the second as the “minor version” number, what’s the next minor version after 1.9?

Normally it would be 1.10. So how would that go in our numerical comparision?

if Float('1.9') > Float('1.10')
  # ...
end

Decimal 1.10 is actually 1.1 (dropping the insignificant 0), and is therefore less than 1.9.

The other problem is if you wanted to have semantic versioning with three version indicator values- “major version”, “minor version”, and “patch version”. E.g. 1.9.3. We can’t use a decimal here.

The solution is not to treat a version number as anything like a decimal. It is three distinct whole number values, grouped together, needing a comparison at each level.

If you are using Ruby, you can use the Gem::Version library, it does the right thing.

Aug 15, 2018 - In-Memory Image Processing with Pipes

At RubyConfAU 2018 presenter Michael Morris demonstrated Kartalytics, which captured in-race data of their games of Mario Kart 8 in real-time.

It’s both an amusing and clever project, and after the conference I was eager to look into the detail of how they were capturing the images.

In short, they use an HDMI to Ethernet device which sends a video stream as UDP packets to a Raspberry Pi. They use ffmpeg to capture screenshots from the video stream at a rate of 5 frames per second, writing these images to the SD card on the Raspberry Pi. Another process iterates over these images, reading them off the SD card and processing them to extract in-game information.

What they did, writing the files to the SD and then reading them again with a separate process, is exactly what I would have done with a quick side project. However that block device I/O can have a penalty.

So I decided to try some experiments to bypass the SD card as a little side project.

Experiment 1 - extracting images from a video stream

You can get ffmpeg to extract images from a video file using a command like

ffmpeg -i video.mpg thumb%04d.jpg

This creates files like thumb0001.jpg, thumb0002.jpg etc.

The first thing I needed to find out was if there was a way to get ffmpeg to write the images to a pipe.

Apparently you can, using:

ffmpeg -i video.mpg -f image2pipe pipe:1

This forces the output format to be images streamed to a pipe, and specifies we want it sent to pipe:1, which is stdout.

That means we can use IO.popen in Ruby to read this stream.

IO.popen(ffmpeg_command) do |data|
  # process data stream
end

To distinguish one image from another in the stream, I had to do a little research on the JPEG format. From that I discovered that the JPEG/JFIF format starts with the bytes 0xFF 0xD8 and ends with the bytes 0xFF 0xD9.

With all that, I could write a quick script to read the stream of images and distinguish them from each other:

Experiment 2 - processing the images from a video stream

In Kartalytics they use the rmagick library, a Ruby wrapper for ImageMagick, to process the images. I wanted to see if I could process straight from the stream.

This is possible with the Image.from_blob function. So with a small change we can take the raw image data and create an ImageMagick Image object ready for processing.

image = Magick::Image.from_blob(img_bytes.pack('C*')).first

The full script is:

Conclusion

So the experiment worked, and turned out to be not nearly as difficult as I thought. The fun part being to learn a little more about the JPEG/JFIF format and to play with pipes, made possible because ffmpeg is so versatile with its formats.