Movie poster for Sharknado, a 2013 science-fiction disaster film featuring a tornado filled with sharks.

Bézier polynomial curve regression

Sharks. I never saw that coming.

June 09, 2026 · John Peach

Ask NotebookLM

Introduction

How often have you said to yourself, “Gosh, I sure wish I knew an easy way to approximate a C1C^1 curve with an nnth-degree Bézier polynomial!” Several times a day would be my guess. Well, there is a way! In fact, just a little bit of linear algebra, orthogonal distance regression, and the Akaike criterion is about all you need. The linear algebra gives us a starting guess, orthogonal distance regression refines the control points by letting the curve slide its closest points onto the data, and the Akaike criterion tells us how many control points are actually worth keeping.

What is this C1C^1 stuff? It’s a way to describe how smooth a function or curve is. Smoothness depends on continuity and differentiability. It’s often possible to look at the plot of a function and see that it’s at least C1C^1. If the function is continuous and doesn’t have any sharp bends anywhere, you’ve probably got a C1C^1 or higher function. The functions we’ll study will be like this, that is, without corners and are continuous everywhere. The CnC^n background section near the end of the article gives the precise definitions, and we’ll confirm that the fitted Bézier result inherits this smoothness.

Why Bézier polynomials? The advantage is that using only a few control points, the entire curve can be defined. Instead of using hundreds of points along the curve, you can generate the same curve from a handful of control points, and the Akaike criterion will tell how many of these control points are needed.

Bézier curves are derived from Bernstein polynomials, but were not applied until Paul de Casteljau derived a numerically stable method now called de Casteljau’s algorithm which he used for computer aided design at the French automaker Citroën. Pierre Bézier independently discovered the polynomials and used them to design car bodies at Renault. We used Bézier curves in Yacht Design with Mathematics. My interest in the inverse problem that is, recovering control points from curve data, started with a 2002 paper by Borges and Pastva.

Bézier curves are now applied in illustrator programs like Adobe Illustrator and Inkscape, in font design, in many Computer-Aided Design (CAD) software programs, and for robotics path planning. By manipulating the locations of the control points, you can quickly modify the shape of the resulting curve. The more difficult inverse problem, which we’ll solve here, is to reconstruct the locations of the control points from a given curve.

Program Outline

The Julia file BezierCurveFit.jl fits Bézier polynomials to a set of points Q=(qx,qy)Q = (q_x,q_y) describing the curve to be fit. The polynomials are parameterized by t[0,1]t \in [0,1], so that qx(i)=fx(ti)q_x(i) = f_x(t_i) for some polynomial fxf_x at tit_i. Similarly, qy(i)=fy(ti)q_y(i) = f_y(t_i). Since we don’t know the values for tt, our first task is to estimate them from the measured points in QQ but, as we’ll see, if the tit_i aren’t known, we won’t get a good fit. The real trick is to stop treating tt as known. Instead of guessing tt and then solving for the control points, we let an optimizer adjust both the control points and the tt values at the same time. That joint optimization is called orthogonal distance regression (ODR) Orthogonal distance regression minimizes the perpendicular (shortest) distance from each data point to the fitted curve, rather than the vertical distance minimized by ordinary least squares. When data points can have errors in both xx and yy — as digitized curve points do — perpendicular distance is the geometrically correct thing to minimize., and it’s what turns a mediocre fit into a good one.

The curve points in QQ can be found by multiplying a matrix of coefficients, AA, and the control points PP, Q=APQ = AP. The inverse also works, so P=A1QP = A^{-1}Q, (using some special tricks described below). To create AA, we need to know how many control points there are, and we need to generate the Vandermonde matrix from tt.

If we knew tt exactly, we could perfectly generate AA, and we’d know where the control points PP were. Errors in our estimate of tt cause fairly large errors when we calculate the estimated control points, Pest=A1QP_{est} = A^{-1}Q, and consequently errors in locations of the estimated curve points, Qest=APestQ_{est} = AP_{est}. The points in QestQ_{est} are interpolated to the nearest approximation of points in QQ.

Using some optimization techniques, we can iterate on the estimates of the locations of the control points until the error between QestQ_{est} and QQ is as small as is computationally possible. Critically, we let the parameter values tt move during this iteration too. Each tit_i slides until B(ti)B(t_i) arrives on the point of the curve closest to the measured QiQ_i. Minimizing that perpendicular (geometric) distance, rather than the distance at a fixed, and possibly wrong tt, is the defining feature of orthogonal distance regression.

In most cases, the curve points QQ weren’t generated using Bézier polynomials, so the approximating polynomials in this process depend on the number and locations of the control points PP. Starting with a small number of control points, we can use this procedure to get QestQ_{est}, then increase the number of control points to generate a new QestQ_{est}, repeating until the fit is good enough. “Good enough” is exactly the judgment the Akaike criterion The Akaike Information Criterion (AIC) was introduced by statistician Hirotugu Akaike in 1974. It balances model fit against complexity by adding a penalty proportional to the number of free parameters — the mathematical formalization of Occam’s Razor. Lower AIC is better; what matters is differences in AIC between models, not the absolute value. automates. More control points always reduce the residual error, so we need a penalty that punishes extra parameters and stops us from overfitting.

The outline for the curve fitting function is:

The first half of this article builds up the linear algebra behind bernstein_matrix and init_control_points; the second half explains the orthogonal distance regression in fit_bezier_odr, the degree selection in select_degree_robust, and the reprojection in monotone_reproject.

Bézier Polynomials

In the article Yacht Design with Mathematics we used Bézier polynomials to approximate the curves of a boat hull. The Bézier polynomial is defined as

B(t)=i=0n(ni)(1t)nitiPiB(t) = \sum_{i=0}^n \binom{n}{i}(1-t)^{n-i}t^i P_i

The PiP_i array represents the locations of n+1n+1 control points and can have any (x,y)(x,y) coordinates you like. Notice that the index starts at i=0i=0 which is why there are n+1n+1 rather than nn control points. (In the code sections later, we follow the Julia convention and call the degree mm, so the curve has m+1m+1 control points — the math here uses nn to keep the standard textbook notation.) The parameter tt goes between 00 and 11, and the symbol

(ni)=n!(ni)!i!\binom{n}{i} = \frac{n!}{(n-i)!i!}

is the binomial coefficient, and k!=k(k1)(k2)1k! = k \cdot (k-1) \cdot (k-2) \cdots 1 is kk factorial. The binomial coefficient tells how many ways you can choose ii things from a set of nn, and it also gives the coefficients in the expansion of (x+y)n(x+y)^n. You can check the expansion coefficients by typing

expand((x+y)^4)

in Wolfram Alpha

expand (x+y)4x4+4x3y+6x2y2+4xy3+y4\begin{aligned} \text{expand } & (x+y)^4 \\ & x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4 \end{aligned}

and checking that the same coefficients appear in the vector,

n = 4
[binomial(n,k) for k = 0:n]'
[1, 4, 6, 4, 1]

when you run it in Julia. You could have gotten each coefficient by typing binomial(4,0), binomial(4,1), …, binomial(4,4), but the notation above is a shortcut to generate them all at once, called list comprehension.

Another way to get the coefficients is to use Pascal’s Triangle. Each row represents the nnth set of expansion coefficients. The zeroth row is the single 11 at the top, so the expansion for (x+y)4(x+y)^4 is the fifth row down, 1,4,6,4,11,4,6,4,1. If you want to explore more about Pascal’s Triangle, read Zaphod Beeblebrox’s Brain and the Fifty-ninth row of Pascal’s Triangle by Andrew Granville.

Pascal's Triangle showing rows 0 through 7, with each entry equal to the sum of the two entries above it.
Figure 1.

Pascal’s Triangle.

The location of the points PiP_i controls the shape of the curve. Here’s a curve created with five control points that looks a little like a shark fin drawn with GeoGebra. Notice that the control points are not in order. Place five points anywhere on the canvas, and then type

curve[x(A)*(1−t)^4+4*x(B)*(1−t)^3*t+6*x(C)*(1−t)^2*t^2+4*x(D)*(1−t)*t^3+x(E)*t^4,y(A)*(1−t)^4+4*y(B)*(1−t)^3*t+6*y(C)*(1−t)^2*t^2+4*y(D)*(1−t)*t^3+y(E)*t^4,t,0,1]

in the Input: dialog box.

A Bézier curve shaped like a shark fin, drawn in GeoGebra using five labeled control points A through E.
Figure 2.

Bézier-generated shark fin in GeoGebra.

You can find any point on the curve by varying tt between 00 and 11, and using the formula for B(t)B(t) above. You can do this in a spreadsheet:

Spreadsheet showing columns for index i, binomial coefficients, powers of t and (1-t), control point coordinates, and the resulting Bézier curve point Q.
Figure 3.

Spreadsheet calculations for the Bézier curve.

In the first column of the spreadsheet I entered the point labels and in the next column is the index, ii, running from 00 to 44. Below the main table, I entered two labeled cells, nn and tt. The value for nn should be one less than the number of points since indexing begins at zero. You can put any value for tt so long as it’s in the range [0,1][0,1]. The third column is the binomial expansion coefficient for nn and ii. Excel uses =combin(n,a2).

The third and fourth columns, “D” and “E” contain the powers of tit^i and (1t)(ni)(1-t)^{(n-i)}. In the next two columns, put in the x,yx,y coordinates for each of the five points. Right click on point A and click on “Object Properties” to open a window on the right side of GeoGebra. Under the “Basic” tab you should see the name of the point and the value. Copy the values for the xx and yy coordinates into the columns labelled P_x and P_y in the spreadsheet. If you’ve put point A at the origin, GeoGebra will have Definition: Intersect(xAxis, yAxis), but just enter zeros.

Any number raised to the 0th0^{th} power is 11, but Excel incorrectly assumes the answer is undefined. If this happens with your spreadsheet, replace the cells with 11‘s. In the example, this would be in cells D2 and E6. Or, you could use the open-source LibreOffice suite which correctly calculates zero powers.

Using the formula,

B(t)=i=0n(ni)(1t)nitiPiB(t) = \sum_{i=0}^n \binom{n}{i}(1-t)^{n-i}t^i P_i

we can calculate Q_x and Q_y for each point, which are the products of columns C,D,E, and F for Q_x and C,D,E, and G for Q_y. Then, sum the columns to get the coordinates for QQ. In GeoGebra, add a new point and rename it QQ. Right click on “Q” and enter the calculated coordinates in the Value box. It should fall somewhere on the shark fin plot. If you change the value of tt in the spreadsheet, you’ll get a new position for QQ somewhere on the plot.

After choosing the locations for control points A,B,C,DA,B,C,D, and EE you can generate the shark fin curve by varying the value of tt between 00 and 11.

Bézier Linear Algebra

Let’s look at B(t)B(t) again,

B(t)=i=0n(ni)(1t)nitiPi.B(t) = \sum_{i=0}^n \binom{n}{i}(1-t)^{n-i}t^i P_i.

Suppose we have 55 control points, {P0,P1,P2,P3,P4}\{ P_0, P_1, P_2, P_3,P_4 \}, so n=4n = 4 since the index begins at zero. If we write out the terms of B(t)B(t) we get

B(t)=i=04(4i)(1t)4itiPi=(40)(1t)4P0+(41)(1t)3tP1+(42)(1t)2t2P2+(43)(1t)t3P3+(44)t4P4=1×(t44t3+6t24t+1)P0+4×(t3+3t23t+1)tP1+6×(t22t+1)t2P2+4×(t+1)t3P3+1×t4P4=1×(t44t3+6t24t+1)P0+4×(t4+3t33t2+t)P1+6×(t42t3+t2)P2+4×(t4+t3)P3+1×t4P4=(t44t3+6t24t+1)P0+(4t4+12t312t2+4t)P1+(6t412t3+6t2)P2+(4t4+4t3)P3+t4P4=[t44t36t24t14t412t312t24t06t412t36t2004t44t3000t40000][P0P1P2P3P4]=[P0P1P2P3P4][1464141212406126004400010000][t4t3t2t1]=[P0P1P2P3P4]([1464113310121001100010000][14641])[t4t3t2t1]=[P0P1P2P3P4]([1464113310121001100010000][1111111111111111111111111][14641])[t4t3t2t1]=P(RSR[1,:]T)T\begin{aligned} B(t) &= \sum_{i=0}^4 \binom{4}{i}(1-t)^{4-i}t^i P_i \\ &= \binom{4}{0} (1-t)^4 P_0 + \binom{4}{1} (1-t)^3 t P_1 + \binom{4}{2} (1-t)^2 t^2 P_2 + \binom{4}{3} (1-t) t^3 P_3 + \binom{4}{4} t^4 P_4\\ &= 1 \times (t^4 -4t^3 + 6t^2 -4t + 1) P_0 + 4 \times (-t^3 +3t^2-3t + 1)t P_1 + 6 \times (t^2 - 2t + 1)t^2 P_2 + 4 \times(-t+1)t^3 P_3 + 1 \times t^4 P_4 \\ &= 1 \times (t^4 -4t^3 + 6t^2 -4t + 1) P_0 + 4 \times (-t^4 +3t^3-3t^2 + t) P_1 + 6 \times (t^4 - 2t^3 + t^2) P_2 + 4 \times(-t^4+t^3) P_3 + 1 \times t^4 P_4 \\ &= (t^4 -4t^3 + 6t^2 -4t + 1) P_0 + (-4t^4 +12t^3-12t^2 + 4t) P_1 + (6t^4 - 12t^3 + 6t^2) P_2 + (-4t^4+4t^3) P_3 + t^4 P_4 \\ &= \begin{bmatrix} t^4 & -4t^3 & 6t^2 & -4t & 1 \\ -4t^4 & 12t^3 & -12t^2 & 4t & 0 \\ 6t^4 & -12t^3 & 6t^2 & 0 & 0 \\ -4t^4 & 4t^3 & 0 & 0 & 0 \\ t^4 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \begin{bmatrix} P_0 \\ P_1 \\ P_2 \\ P_3 \\ P_4 \end{bmatrix} \\ &= \begin{bmatrix} P_0 & P_1 & P_2 & P_3 & P_4 \end{bmatrix} \begin{bmatrix} 1 & -4 & 6 & -4 & 1 \\ -4 & 12 & -12 & 4 & 0 \\ 6 & -12 & 6 & 0 & 0 \\ -4 & 4 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \begin{bmatrix} t^4 \\ t^3 \\ t^2 \\ t \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} P_0 & P_1 & P_2 & P_3 & P_4 \end{bmatrix} \left( \begin{bmatrix} 1 & -4 & 6 & -4 & 1 \\ -1 & 3 & -3 & 1 & 0 \\ 1 & -2 & 1 & 0 & 0 \\ -1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \odot \begin{bmatrix} 1 \\ 4 \\ 6 \\ 4 \\ 1 \end{bmatrix} \right) \begin{bmatrix} t^4 \\ t^3 \\ t^2 \\ t \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} P_0 & P_1 & P_2 & P_3 & P_4 \end{bmatrix} \left( \begin{bmatrix} 1 & 4 & 6 & 4 & 1 \\ 1 & 3 & 3 & 1 & 0 \\ 1 & 2 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \circ \begin{bmatrix} 1 & -1 & 1 & -1 & 1 \\ -1 & 1 & -1 & 1 & -1 \\ 1 & -1 & 1 & -1 & 1 \\ -1 & 1 & -1 & 1 & -1 \\ 1 & -1 & 1 & -1 & 1 \\ \end{bmatrix} \odot \begin{bmatrix} 1 \\ 4 \\ 6 \\ 4 \\ 1 \end{bmatrix} \right) \begin{bmatrix} t^4 \\ t^3 \\ t^2 \\ t \\ 1 \end{bmatrix} \\ &= P \left( R \circ S \odot R[1,:]^T \right) T \\ \end{aligned}

Each row of the matrix RR is binomial expansion of nk,  k=0,,n1n-k, \; k = 0, \dots, n-1. Both \circ and \odot are used here for element-wise (Hadamard) multiplication of two matrices with the same dimensions — the distinction in what follows is purely structural: \circ multiplies two full matrices entry-by-entry, while \odot broadcasts a column vector across all columns of a matrix (the Julia dot-broadcast operation). The sign matrix SS with alternating ±1\pm1‘s changes every other sign of the values in the matrix RR. The binomial expansion [14641]T\begin{bmatrix} 1 & 4 & 6 & 4 & 1 \end{bmatrix}^T multiplies each column of RSR \circ S by the appropriate coefficient, and is the first row of RR, denoted by R[1,:]TR[1,:]^T. Multiplying the coefficient matrices together gives

C=[1464113310121001100010000][1111111111111111111111111][14641]=[1464141212406126004400010000]C = \begin{bmatrix} 1 & 4 & 6 & 4 & 1 \\ 1 & 3 & 3 & 1 & 0 \\ 1 & 2 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \circ \begin{bmatrix} 1 & -1 & 1 & -1 & 1 \\ -1 & 1 & -1 & 1 & -1 \\ 1 & -1 & 1 & -1 & 1 \\ -1 & 1 & -1 & 1 & -1 \\ 1 & -1 & 1 & -1 & 1 \\ \end{bmatrix} \odot \begin{bmatrix} 1 \\ 4 \\ 6 \\ 4 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 & -4 & 6 & -4 & 1 \\ -4 & 12 & -12 & 4 & 0 \\ 6 & -12 & 6 & 0 & 0 \\ -4 & 4 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \end{bmatrix}

Since we know how many control points there are, we can quickly create the coefficient matrix CC. Multiply the control points array PP by the coefficient matrix to get

CP=[1464141212406126004400010000][000.752.03.069771.289531.80562.149511.255210]=[9.451438.8608620.6148417.123689.4186216.262823.08.00.00.0].\begin{aligned} CP = \begin{bmatrix} 1 & -4 & 6 & -4 & 1 \\ -4 & 12 & -12 & 4 & 0 \\ 6 & -12 & 6 & 0 & 0 \\ -4 & 4 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 0 \\ 0.75 & 2.0 \\ 3.06977 & 1.28953 \\ 1.8056 & 2.14951 \\ 1.25521 & 0 \\ \end{bmatrix} = \begin{bmatrix} 9.45143 & -8.86086 \\ -20.61484 & 17.12368 \\ 9.41862 & -16.26282 \\ 3.0 & 8.0 \\ 0.0 & 0.0 \\ \end{bmatrix}. \end{aligned}

The last step is to create the vector of tt values between 00 and 11 using the Julia range function:

t = range(0, 1, num = nPts)

Setting nPts = 100 gives 100 points along the curve. For each value of tt we need [t4t3t2t1]\begin{bmatrix} t^4 & t^3 & t^2 & t & 1 \end{bmatrix}, which is a Vandermonde matrix, but flipped left and right. For any input vector vv, with exponents up to nn, this code returns the Vandermonde matrix, TT, which has size nPts×(n+1)nPts \times (n+1):

VM = Matrix{Float64}(undef, length(v), n + 1);
for k in 0:n
    VM[:, k + 1] = v .^ (n - k)
end

The Bézier curve points can be found by multiplying B=T×C×PB = T \times C \times P, which will be nPts×2nPts \times 2. The first column contains the xx-coordinates and the second the yy-coordinates. To check that the math is correct, I plotted the Bézier curve of the shark fin in Julia and then overlaid it onto the GeoGebra curve. The red dashed line is the curve generated by GeoGebra, and the solid blue is the Julia curve.

Plot showing the shark-fin Bézier curve generated by GeoGebra (red dashed) overlaid with the same curve computed in Julia (solid blue), confirming agreement.
Figure 4.

Original shark-fin curve with GeoGebra overlay.

But, suppose you had the plot already, and wanted to know where the control points should be placed to reproduce the curve? We’d like to be able to reverse the process to get the control points.

Curve point coordinates

To get the coordinates of the curve points in GeoGebra first turn off extraneous points. On the left side of the window, under Algebra is a list of the control points with small dots to the left of the label. Click on each one to hide the point in the graph. Save the graph as an image using File \rightarrow Export \rightarrow Graphics View as Picture (png, eps).

Now, go to the WebPlotDigitizer site and click on Launch Now! Under File select Load Image(s) and Browse to find your image of the curve. The default type is 2D (X,Y) Plot which is right for our plot. Click Align Axes at the bottom and Proceed on the next pop-up window. Click on a known xx-coordinate somewhere on the left side and then a second known coordinate to the right. Repeat this for two known yy-coordinates. You can use the origin for both if you like. If you need to adjust points, click on the point and use the arrow keys to move it. When the points are in the correct positions, click on Complete! The next dialog asks for the coordinates of these points.

Under Automatic Extraction, click on the blue square to the right of Foreground Color, then choose Color Picker and click on a point of the curve, then Done. Click Run and you should see points that WebPlotDigitizer has selected along the curve. If any points don’t seem to be on the curve, click Adjust Point (S) and use the arrow keys to move them back onto the curve.

On the left side under Dataset click View Data, Sort by: Nearest Neighbor and then Download .CSV. The Nearest Neighbor sorting option puts the points in order along the length of the curve. Name the file something like SharkFinCurve.csv and save it. Open the file with a text editor such as Notepad++. Bézier polynomials go through the first and last control points exactly, so if WebPlotDigitizer didn’t pick up these points, add them in to the list of curve points. Do this by entering t=0t=0 and t=1t=1 in the spreadsheet and copying the summed values for Q_x and Q_y for each.

WebPlotDigitizer interface showing the shark-fin curve image with digitized data points overlaid along the curve.
Figure 5.

Extracting curve points using WebPlotDigitizer.

Finding the Control Points

The equation for the Bézier polynomial is the sum of coefficients times the control point coordinates. We could write it as

B(t)=i=0n(ni)(1t)nitiPi=(n0)(1t)nP0+(n1)(1t)n1tP1++(nn)tnPn.\begin{aligned} B(t) &= \sum_{i=0}^n \binom{n}{i}(1-t)^{n-i}t^i P_i \\ &= \binom{n}{0}(1-t)^nP_0 + \binom{n}{1}(1-t)^{n-1}tP_1 + \cdots + \binom{n}{n}t^nP_n. \end{aligned}

A more compact way to write it would be as the multiplication of two vectors (multiply corresponding entries, sum the whole thing),

B(t)=[(n0)(1t)n(n1)(1t)n1t(nn)tn][P0P1Pn].B(t) = \begin{bmatrix} \binom{n}{0}(1-t)^n & \binom{n}{1}(1-t)^{n-1}t & \cdots & \binom{n}{n}t^n \end{bmatrix} \begin{bmatrix} P_0 \\ P_1 \\ \vdots \\ P_n \end{bmatrix}.

If we use t=0.2t = 0.2 the result is the point Q=[0.826948,1.072299]Q = [0.826948, 1.072299]. We could do this multiplication for many different values of tt along the curve and get the matrix multiplication,

[(n0)(1t0)n(n1)(1t0)n1t0(nn)t0n(n0)(1t1)n(n1)(1t1)n1t1(nn)t1n(n0)(1tk)n(n1)(1tk)n1tk(nn)tkn][P0P1Pn]=[Q0Q1Qk].\begin{bmatrix} \binom{n}{0}(1-t_0)^n & \binom{n}{1}(1-t_0)^{n-1}t_0 & \cdots & \binom{n}{n}t_0^n \\ \binom{n}{0}(1-t_1)^n & \binom{n}{1}(1-t_1)^{n-1}t_1 & \cdots & \binom{n}{n}t_1^n \\ \vdots & \vdots & \vdots & \vdots \\ \binom{n}{0}(1-t_k)^n & \binom{n}{1}(1-t_k)^{n-1}t_k & \cdots & \binom{n}{n}t_k^n \\ \end{bmatrix} \begin{bmatrix} P_0 \\ P_1 \\ \vdots \\ P_n \end{bmatrix} = \begin{bmatrix} Q_0 \\ Q_1 \\ \vdots \\ Q_k \end{bmatrix}.

In the example, there are five control points PiP_i and WebPlotDigitizer generated k=84k = 84 steps along the curve. The matrix on the left is 8484 rows by 55 columns, and the vector of QQ‘s is 8484 points long.

Compactifying a bit more, we can write this system of equations as AP=QA P = Q where AA is the matrix of coefficients on the left, PP is the vector of control points, and QQ is the vector of measured points along the curve. We need to find a vector PP that makes the difference between the estimated points and the measured points as small as possible. That is, we want to minimize the error APQ\lVert AP-Q \rVert, but it’s slightly more convenient to minimize the square, (APQ)2(AP-Q)^2 which gives another solution.

If we expand the square we get

(APQ)2=(APQ)T(APQ)=(AP)T(AP)(AP)TQQT(AP)+QTQ=PT(ATA)P2PTATQ+Q2\begin{aligned} (AP-Q)^2 &= (AP-Q)^T(AP-Q) \\ &= (AP)^T(AP) - (AP)^TQ - Q^T(AP) + Q^TQ \\ &= P^T (A^TA) P - 2P^TA^TQ + \|Q\|^2 \end{aligned}

recalling that (AB)T=BTAT(AB)^T = B^T A^T for compatible matrices. To minimize the error we can take the derivative, and set the result to zero,

δδP(APQ)2=δδPPT(ATA)P2PTATQ+Q2=2(ATA)P2ATQ.\begin{aligned} \frac{\delta}{\delta P}(AP-Q)^2 &= \frac{\delta}{\delta P} P^T (A^TA) P - 2P^TA^TQ + \|Q\|^2 \\ &= 2(A^TA)P - 2A^TQ. \end{aligned}

The solution for PP is simply,

P=(ATA)1(ATQ).P = (A^TA)^{-1}(A^TQ).

In Julia, the numerically stable way to evaluate this least-squares solution is the backslash operator, which avoids forming and inverting ATAA^TA explicitly:

P = bernstein_matrix(t, m) \ Q

so you just need to get AA and then the calculation is easy. There’s a minor flaw with this method, though. The QQ points along the curve seem to be uniformly spaced, but that doesn’t mean the values of tit_i are. We don’t know what tit_i should be for each QiQ_i, but we know that t0=0t_0 = 0 and tn=1t_n = 1, so we could calculate the cumulative distance along the curve and normalize by the total length to get approximate values for each ti.t_i. Keep this flaw in mind — it’s the whole reason we’ll need orthogonal distance regression. The least-squares solution here isn’t our final answer; it’s a warm start that gets the optimizer into the right neighborhood.

Building the A matrix

The “Bézier Linear Algebra” section above expanded the basis into monomials — a coefficient matrix CC times a Vandermonde matrix in tt — which is great for seeing where the binomial coefficients come from. In code, though, it’s both shorter and numerically safer to evaluate the Bernstein basis directly from its definition,

Aij=(mj)tij(1ti)mj,i=1,,n,j=0,,m,A_{ij} = \binom{m}{j}\,t_i^{\,j}\,(1-t_i)^{\,m-j}, \qquad i = 1,\dots,n,\quad j = 0,\dots,m,

where mm is the polynomial degree (the same thing the math above called nn), so there are m+1m+1 control points. This is exactly bernstein_matrix in BezierCurveFit.jl:

function bernstein_matrix(t::AbstractVector{Float64}, m::Int)::Matrix{Float64}
    n = length(t)
    B = zeros(Float64, n, m + 1)
    for j = 0:m
        c = binomial(m, j)
        for i = 1:n
            B[i, j + 1] = c * (t[i]^j) * ((1.0 - t[i])^(m - j))
        end
    end
    return B
end

From here on I’ll switch to the code’s convention: m is the degree, and the curve has m + 1 control points. The returned matrix is n×(m+1)n \times (m+1) — one row per measured point, one column per control point — and evaluating the curve is then just the matrix product bernstein_matrix(t, m) * P, which is what the helper bezier_eval does.

We still don’t know the true values for tt, though. The simplest guess is to space them evenly between 00 and 11, but a better guess measures the distances between the points in QQ, the array returned by WebPlotDigitizer:

function chord_length_param(Q::Matrix{Float64})::Vector{Float64}
    n = size(Q, 1)
    d = [norm(Q[i+1, :] - Q[i, :]) for i in 1:(n-1)]
    s = cumsum([0.0; d])
    return s ./ s[end]
end

This is the chord-length parameterization The chord-length parameterization assigns tt values proportional to the cumulative straight-line (chord) distances between successive data points, as a proxy for arc length. It distributes parameter values roughly uniformly along the curve, avoiding the clustering that would result from spacing tt evenly when the data points themselves are unevenly spaced.. Each d[i] is the Euclidean distance between successive points in QQ,

di=(Qi(x)Qi1(x))2+(Qi(y)Qi1(y))2.d_i = \sqrt{(Q_i(x) - Q_{i-1}(x))^2 + (Q_i(y) - Q_{i-1}(y))^2}.

The cumulative sum s is the running arc length with a 00 prepended, and dividing by the total length s[end] normalizes everything into [0,1][0,1] with t1=0t_1 = 0 and tn=1t_n = 1.

The result is:

Plot of the estimated parameter t versus point index along the shark-fin curve, showing the chord-length parameterization.
Figure 6.

Estimated values of parameter tt.

There’s very little difference between even spacing and chord length for densely sampled curves. Using the basis matrix at these estimated tt values, we can take a first stab at the control points with an ordinary least-squares solution, and reconstruct the curve from them:

A     = bernstein_matrix(t, m)   # n × (m+1)
P_est = A \ Q                    # least-squares solve, (m+1) × 2
Q_est = A * P_est

Julia’s backslash operator \ solves the overdetermined system AP=QA P = Q in the least-squares sense. Plotting the estimated curve points gives a result that is not very shark-fin-ish:

Reconstructed Bézier curve from the initial least-squares solve, shown diverging noticeably from the original shark-fin shape.
Figure 7.

The reconstructed curve doesn’t match the original.

The problem is that even small errors in the estimates of tt for each of the measured points in QQ can result in large errors in the estimated control points PestP_{est}.

P_xP_yP_x (est)P_y (est)
000.1299550.076279
0.752-0.456520.5199
3.069771.289533.1000842.820746
1.80562.149511.6717380.991036
1.2552101.184665-0.01976

But, notice that the blue curve generated from the estimated control points PP overlays the fit generated by GeoGebra, which uses a different tt vector from the one our least-squares solve used. The implication is that we should be able to match the original shark-fin curve if we can somehow maneuver the control points into place and, just as importantly, let the tt values move too.

A better warm start with endpoint-constrained least squares

Before handing the problem to an optimizer, we can do better than the raw least-squares solution. A Bézier curve passes exactly through its first and last control points, and we already know where the curve starts and ends — those are just the first and last measured points, Q1Q_1 and QnQ_n. So we can fix P0=Q1P_0 = Q_1 and Pm=QnP_m = Q_n and solve the least-squares problem for only the interior control points. The function init_control_points does this:

function init_control_points(Q::Matrix{Float64}, t::Vector{Float64}, m::Int)::Matrix{Float64}
    if m == 1
        return [Q[1, :]'; Q[end, :]']
    end

    A       = bernstein_matrix(t, m)          # (n, m+1)
    A_fixed = A[:, [1, m+1]]                  # endpoint basis columns
    A_free  = A[:, 2:m]                       # interior basis columns (n, m-1)
    rhs     = Q - A_fixed * Q[[1,end], :]     # subtract fixed contributions
    P_free  = A_free \ rhs                    # (m-1, 2), overdetermined LS

    return vcat(Q[[1],:], P_free, Q[[end],:])
end

The trick is to move the known endpoint contributions to the right-hand side. We split the basis matrix into its two endpoint columns A_fixed and its interior columns A_free. The endpoints contribute A_fixed * [Q₁; Qₙ] to every fitted point, so we subtract that off and solve A_free \ rhs for the interior points alone. Stacking the fixed endpoints back around the solved interior points gives an (m+1)×2(m+1)\times 2 control-point matrix that already honors the endpoints and sits close to the right answer. This is the warm start, and now we’ll refine it.

Orthogonal Distance Regression

Ordinary least squares minimizes

i=1nA(ti)PQi2\sum_{i=1}^{n} \big\lVert A(t_i)\,P - Q_i \big\rVert^2

with the tit_i held fixed at whatever we guessed. The trouble, as the table above showed, is that a wrong tit_i drags the control points to the wrong place. Orthogonal distance regression (ODR) fixes this by minimizing over the control points and the parameter values at the same time,

minP,  {ti}  i=1nB(ti;P)Qi2,\min_{P,\;\{t_i\}} \; \sum_{i=1}^{n} \big\lVert B(t_i;P) - Q_i \big\rVert^2,

subject to the endpoint constraints P0=Q1P_0 = Q_1, Pm=QnP_m = Q_n, t1=0t_1 = 0, and tn=1t_n = 1. Because each interior tit_i is now free to move, at the optimum B(ti)B(t_i) lands on the point of the curve closest to QiQ_i. The residual we minimize is therefore the perpendicular, or geometric, distance from each data point to the curve which is exactly the closest-point problem we’ll work out by hand in the next section. ODR does it for us, for every point, on every iteration.

The Julia package Odrpack.jl wraps the Fortran ODRPACK95 solver, whose engine is a trust-region Levenberg–Marquardt minimizer designed for exactly this problem (Zwolak, Boggs & Watson, Algorithm 869, ACM TOMS 33(4), 2007). It minimizes the general weighted form

minβ,  δ  i=1n[wy,if(xi+δi;β)yi2+wx,iδi2].\min_{\beta,\;\delta}\; \sum_{i=1}^{n}\Big[ w_{y,i}\big\lVert f(x_i+\delta_i;\beta) - y_i\big\rVert^2 + w_{x,i}\,\delta_i^2 \Big].

Here β\beta holds the model parameters (our flattened control points), xx is the explanatory variable (our parameter vector tt), yy is the response (our data QQ), and δi\delta_i is the adjustment the solver makes to each xix_i. The weight wx,iw_{x,i} penalizes moving tt; driving it toward zero makes tt effectively free and recovers pure geometric distance. Setting it to exactly zero is numerically degenerate in ODRPACK95, so the code uses a tiny value, wx=1010w_x = 10^{-10} making the parameters effectively free, while keeping the solver well-conditioned.

ODRPACK calls a model function with the signature f!(x, beta, y) that writes the curve points into y in place. Ours just reshapes beta into control points and evaluates the curve:

function bezier_fcn!(x::AbstractVector{Float64},
                     beta::Vector{Float64},
                     y::Matrix{Float64})
    y .= bezier_eval(x, beta_to_P(beta))
    return nothing
end

The control points live in beta as an interleaved vector [P0x,P0y,P1x,P1y,][P_{0x}, P_{0y}, P_{1x}, P_{1y}, \dots], so we convert back and forth with a reshape:

beta_to_P(beta) = Matrix(reshape(beta, 2, :)')   # interleaved vector → (p × 2)
P_to_beta(P)    = vec(P')                          # (p × 2) → interleaved vector

Putting the warm start, the constraints, and the model together gives fit_bezier_odr:

function fit_bezier_odr(Q::Matrix{Float64}, m::Int;
                        tol::Float64=1e-10,
                        reproject::Bool=false)
    n = size(Q, 1)



    t_init = chord_length_param(Q)
    P_init = init_control_points(Q, t_init, m)
    beta_init = P_to_beta(P_init)



    fix_b = falses(length(beta_init))
    fix_b[1:2] .= true       # fix P₀ = Q[1,:]
    fix_b[end-1:end] .= true # fix Pₘ = Q[end,:]



    fix_x_arr = falses(n)
    fix_x_arr[1] = true      # fix t₁ = 0
    fix_x_arr[end] = true    # fix tₙ = 1



    result = odr_fit(
        bezier_fcn!,
        t_init,
        Q,
        beta_init;
        fix_beta=fix_b,
        fix_x=fix_x_arr,
        jac_beta! = bezier_jac_beta!,
        jac_x! = bezier_jac_x!,
        weight_x = fill(1e-10, n)
    )



    P_fit = beta_to_P(result.beta)
    t_fit = vec(result.xplusd)



    if reproject
        t_fit = monotone_reproject(Q, P_fit; tol=tol)
    end



    return P_fit, t_fit, result
end

The fix_beta mask freezes the first and last control points; fix_x freezes the first and last tt values at 00 and 11. Everything else, the interior control points and every interior ti,t_i, is free to move. The optimized parameter values come back as result.xplusd (the initial x plus the fitted adjustments δ\delta).

Analytic Jacobians

We could let ODRPACK estimate the derivatives by finite differences, but the Bernstein basis has analytic derivatives, so supplying them is faster and more accurate. The curve is linear in the control points, so the Jacobian with respect to β\beta is just the basis itself. Each coordinate of the output depends only on the matching coordinate of each control point, which is why the entries land on alternating slots:

function bezier_jac_beta!(x, beta, jb)
    m = (length(beta) ÷ 2) - 1
    n = length(x)
    B = bernstein_matrix(x, m)
    fill!(jb, 0.0)
    for j in 1:(m + 1)
        for i in 1:n
            jb[i, 2*j - 1, 1] = B[i, j]
            jb[i, 2*j, 2]     = B[i, j]
        end
    end
    return nothing
end

The Jacobian with respect to tt needs the derivative of the curve, which uses the derivative of the Bernstein basis,

dBm,jdt=m(Bm1,j1(t)Bm1,j(t)),\frac{dB_{m,j}}{dt} = m\big(B_{m-1,j-1}(t) - B_{m-1,j}(t)\big),

with the convention Bm1,10B_{m-1,-1} \equiv 0 and Bm1,m0B_{m-1,m} \equiv 0. That recurrence is bernstein_deriv, and bezier_jac_x! multiplies it by the control points to get dB/dtdB/dt at each point. With both Jacobians supplied, the solver recovers control points that put the curve right on top of the data, because the optimizer now has the freedom to slide the parameter values into place.

Shark-fin data points (circles) with the ODR-fitted degree-4 Bézier curve overlaid. The curve passes through all points with a maximum residual of 0.0025.
Figure 8.

ODR-fitted Bézier curve (m=4m=4) overlaid on the shark-fin data. The maximum geometric residual is 0.0025 — roughly 58× smaller than the degree-3 fit.

Choosing the Degree

We’ve been assuming we know how many control points to use, but in practice we won’t. Adding control points (raising the degree mm) can only lower the residual error, and a high enough degree will fit the curve through every measured point even if it’s noisy. This is called overfitting, and looking at the residual error alone will never tell you that the curve has been over fit. We need a score that rewards a good fit but penalizes extra parameters.

Degree selection with AIC

The Akaike Information Criterion (AIC) gives this score for good fits,

AIC=2k2ln(L^),\text{AIC} = 2k - 2\ln(\hat{L}),

where kk is the number of free parameters and L^\hat{L} is the maximized likelihood. For a model with Gaussian residuals and unknown variance, the log-likelihood term reduces to nln(SSE/n)n\ln(\text{SSE}/n) plus a constant (after profiling out the variance), where SSE is the sum of squared residuals. The constant is the same for every degree, so it drops out of the comparison, leaving the form select_degree_aic uses:

AIC(m)=nln ⁣(max(SSE(m),ε)n)+4(m1).\text{AIC}(m) = n\,\ln\!\left(\frac{\max(\text{SSE}(m),\,\varepsilon)}{n}\right) + 4(m-1).

A degree-mm curve has m+1m+1 control points, but the two endpoints are fixed, leaving m1m-1 free interior control points. Each has an xx and a yy coordinate, so there are 2(m1)2(m-1) free parameters, and the penalty is 2k=4(m1)2k = 4(m-1), which is the final term above. We pick the degree with the lowest AIC:

function select_degree_aic(Q::Matrix{Float64};
                           m_min::Int=2, m_max::Int=8,
                           tol::Float64=1e-10)
    n = size(Q, 1)
    m_range = m_min : min(m_max, n-2)
    aic = Dict{Int,Float64}()
    best_aic = Inf
    m_best = m_min
    P_best = zeros(0, 2)
    t_best = zeros(0)



    for m in m_range
        P_fit, t_fit, _ = fit_bezier_odr(Q, m; tol=tol)
        SSE = sum((bezier_eval(t_fit, P_fit) .- Q).^2)
        aic[m] = n * log(max(SSE, eps()) / n) + 4*(m-1)
        if aic[m] < best_aic
            best_aic = aic[m]
            m_best = m
            P_best = P_fit
            t_best = t_fit
        end
    end
    return m_best, P_best, t_best, aic
end

The loop fits every degree from 22 up to min(8,n2)\min(8, n-2) by ODR, computes the SSE at the fitted parameters, scores it, and keeps the best. The max(SSE, eps()) guard keeps the logarithm finite if a fit is essentially perfect.

Note the specific form of the penalty: a degree-mm curve has m+1m+1 control points, but the two endpoints are fixed, leaving m1m-1 free interior control points. Each has an xx and a yy coordinate, so there are k=2(m1)k = 2(m-1) free parameters, and substituting into the standard 2k2k penalty gives 4(m1)4(m-1), the final term in the formula above.

There are three further caveats worth keeping in mind:

The function select_degree_aic works well when the data is noisy, because noise creates a floor on SSE that the penalty can eventually overcome, and the AIC minimum selects a sensible degree. For the shark-fin curve with n=30n=30 points and Gaussian noise σ=0.05\sigma=0.05, the AIC minimum is m=4m^*=4, which is exactly the true degree.

When AIC misleads: the SSE floor problem

On clean, digitized data the story is different. The shark-fin CSV (SharkFinCurve.csv, n=84n=84 points, no measurement noise) produces the following table when select_degree_aic sweeps m=2,,10m = 2, \ldots, 10:

mmkkSSEAICΔ\DeltaAICR2R^2max resid
220.636−4069080.9870.205
340.310−4638510.9940.142
46≈ 0−1246681.0000.0025
58≈ 0−1245691.0000.0024
610≈ 0−1259551.0000.0021
712≈ 0−1274401.0000.0024
814≈ 0−1280341.0000.0019
916≈ 0−1288261.0000.0019
1018≈ 0−1314*01.0000.0018

Once the degree reaches 4, SSE drops to numerical zero and R2=1.000R^2 = 1.000. The log-likelihood term saturates, and AIC keeps improving only because the penalty 4(m1)4(m-1) grows more slowly than the marginal gain from floating-point noise. The same failure occurs with AICc\text{AIC}_c and BIC. All three criteria prefer the maximum degree in the search range, wherever that happens to be set.

The real decision boundary is visible in the max residual column: a 58× drop between m=3m=3 (0.142) and m=4m=4 (0.0025), after which further improvement is less than 30% across six additional degrees. AIC can’t see this because it operates on SSE, which is already near zero. The geometric residual is what carries the signal.

Automatic degree selection: select_degree_robust

select_degree_robust in DegreeSelection.jl automates the elbow argument above. Its algorithm has three steps.

Step 1 — Sweep. Call fit_bezier_odr for every mm in [mmin,min(mmax,n2)][m_{\min}, \min(m_{\max}, n-2)], recording SSE, AIC, and the maximum geometric residual rmax(m)=maxiB(ti;P)Qir_{\max}(m) = \max_i \lVert B(t_i; P) - Q_i \rVert at each degree.

Step 2 — Elbow detection. For each consecutive pair of degrees, compute the improvement ratio

ρ(m)=rmax(m1)rmax(m).\rho(m) = \frac{r_{\max}(m-1)}{r_{\max}(m)}.

The elbow is the first mm where ρ(m)>τ\rho(m) > \tau (default τ=10\tau = 10). If rmax(m)<εmachiner_{\max}(m) < \varepsilon_\text{machine} the ratio is treated as \infty, so a numerically exact fit always triggers the elbow. If no ratio exceeds τ,\tau, which happens on noisy data where the residual declines smoothly, the function falls back to the AIC minimum, matching the behavior of select_degree_aic.

Step 3 — Plateau selection. Among all degrees mmelbowm \geq m_\text{elbow} where

rmax(m)εrmax(melbow),ε=2 (default),r_{\max}(m) \leq \varepsilon \cdot r_{\max}(m_\text{elbow}), \quad \varepsilon = 2 \text{ (default)},

return the smallest such mm.

function select_degree_robust(Q::Matrix{Float64};
                              m_min::Int    = 2,
                              m_max::Int    = 12,
                              τ::Float64   = 10.0,
                              ε::Float64   = 2.0,
                              tol::Float64 = 1e-10)
    sweep   = fit_sweep(Q, m_min, m_max, tol)
    degrees = sort(collect(keys(sweep)))



    m_elbow = _detect_elbow(sweep, τ)



    if m_elbow === nothing
        m_elbow = degrees[argmin([sweep[m].AIC for m in degrees])]
        println("[select_degree_robust] No elbow found; " *
                "falling back to AIC minimum at m=$m_elbow")
    end



    m_best = _plateau_select(sweep, m_elbow, ε)



    println("[select_degree_robust] Elbow at m=$m_elbow. " *
            "Selected m=$m_best (smallest in plateau). " *
            "AIC=$(round(sweep[m_best].AIC; digits=2))")



    # diagnostics Dict omitted for brevity — see DegreeSelection.jl
    return m_best, sweep[m_best].P, sweep[m_best].t
end

On the shark-fin curve, the function prints:

[select_degree_robust] Elbow at m=4 (ratio=57.8). Selected m=4 (smallest in plateau). AIC=-1245.85

The ratio 57.8 is the measured ρ(4)\rho(4): the max residual fell by nearly 58× when the degree increased from 3 to 4.

AIC plotted against Bézier degree m for the shark-fin data. The curve drops sharply between m=3 and m=4, then flattens. A dashed red line marks the elbow at m*=4.
Figure 9.

AIC vs. degree for the noise-free shark-fin data (n=84n=84).

The sharp drop between m=3m=3 and m=4m=4 is the elbow that select_degree_robust detects. AIC continues to improve above m=4m=4 because SSE has already reached numerical zero, so the penalty no longer dominates. The dashed line marks the selected degree m=4m^*=4.

The Akaike Information Criterion isn’t the only model selection tool. The Bayesian Information Criterion

BIC(m)=nln ⁣(SSE(m)n)+2(m1)ln(n)\text{BIC}(m) = n\,\ln\!\left(\frac{\text{SSE}(m)}{n}\right) + 2(m-1)\ln(n)

replaces the fixed penalty 2k=4(m1)2k = 4(m-1) with kln(n)=2(m1)ln(n)k\ln(n) = 2(m-1)\ln(n), which grows with sample size and consistently favors lower degrees than AIC. A good overview of both is in Choosing the Best Model: A Friendly Guide to AIC and BIC. On the clean shark-fin data, BIC and AICc\text{AIC}_c both suffer the same SSE-floor failure as AIC, and all three prefer the maximum degree in the sweep range.

Three-panel figure comparing AIC, AICc, BIC, and geometric residuals across degrees 2 to 10 for the shark-fin data.
Figure 10.

Comparison of AIC, AICc\text{AIC}_c, and BIC (top two panels) and geometric residuals (bottom panel) across degrees 2–10 for the noise-free shark-fin data.

All three information criteria continue improving past the elbow at m=4m=4 because SSE is at numerical zero for m4m \geq 4. The residual panel (log scale) shows the true signal: a 58× drop at m=4m=4, with less than 30% further improvement across six additional degrees. The vertical gray line marks the elbow.

To summarize the selection strategy:

Monotone reparameterization

The ODR routine is free to move each tit_i wherever it lowers the distance, and nothing in the objective forces the tit_i to stay in order. Usually they do, but on a curve that doubles back on itself, two data points can be assigned tt values that are out of sequence. If you need a strictly increasing parameterization for resampling, arc-length work, or just tidiness monotone_reproject re-solves for tt point by point, holding each one to the right of its predecessor:

function monotone_reproject(Q::Matrix{Float64}, P::Matrix{Float64};
                             tol::Float64=1e-10)::Vector{Float64}
    n = size(Q, 1)
    t_proj    = zeros(n)
    t_proj[n] = 1.0
    for i in 2:n-1
        t_lo = t_proj[i-1]
        dist2(t) = sum((bezier_eval([t], P)[1, :] .- Q[i, :]) .^ 2)
        t_proj[i] = _golden_section(dist2, t_lo, 1.0; tol=tol)
    end
    return t_proj
end

For each interior point it minimizes the squared distance to the curve over the interval [ti1,1][t_{i-1}, 1] with a golden-section search, which guarantees the new tit_i can’t fall below ti1t_{i-1}. The control points are left untouched and only the parameterization changes so the SSE may increase slightly relative to the unconstrained ODR fit. Notice that this per-point closest-point search is exactly the geometric problem of the next section, solved numerically one point at a time. ODR was solving the same problem for all points jointly.

Note: The two sections that follow (“Finding the closest point by hand” and “Cubic and Quadratic Spline Interpolation”) are geometric background. None of it is needed to run the fit — ODR handles the closest-point problem automatically. These sections are here for readers who want to understand what the optimizer is minimizing, or who need to compute closest points independently of the fitting pipeline.

Finding the closest point by hand

Rather than trying to find the optimal tt vector, suppose we wanted to find the best values for the control point locations using the tt we have. The locations of the points in QestQ_{est} are determined by the positions of the control points PP, so the goal is to move the control points around until the difference between QestiQ_{est_i} and QiQ_i^* is minimized. A couple of ways to minimize these distances are to minimize the sum of the errors, or to minimize the worst case, or maximum error. This is the manual version of the orthogonal distance that ODR minimizes automatically.

Perhaps the easiest way to find the closest point on the curve QQ to an estimated point QestiQ_{est_i} would be to draw straight lines between successive points in QQ and then get the minimum distance between QestiQ_{est_i} and the nearest line segment. This will be perpendicular to the line segment and passing through QestiQ_{est_i}.

Two cases for finding the closest point on a curve: Case 1 where a perpendicular from the estimated point lands on a line segment, and Case 2 where it falls in an ambiguous region near a node.
Figure 11.

Linear interpolation of the parameterization tt.

This works well so long as QestQ_{est} can be reached by a perpendicular segment (Case 1). But as shown in Case 2, if QestQ_{est} is near node QiQ_i, and between the endpoint perpendiculars, the method fails. A sharper turn at QiQ_i leaves more room for an undefined condition, but in most cases, we’ll have lots of points along the length of QQ, and the turns will be fairly shallow. If QestQ_{est} falls in the undefined region, the easiest solution would be to simply measure the distance between QestQ_{est} and QiQ_i.

To find QQ^* in Case 1, lets parameterize the segment from Qi1Q_{i-1} to QiQ_i by τ\tau where 0τ10 \leq \tau \leq 1, so the segment becomes

L(τ)=(1τ)Qi1+τQi.L(\tau) = (1-\tau)Q_{i-1} + \tau Q_i.

If τ=0\tau = 0, then you’re at Qi1Q_{i-1} and if τ=1\tau = 1 you’ve arrived at QiQ_i. The point QQ^* will be at some value of τ\tau where the vector QestQQ_{est} - Q^* is perpendicular to the vector QiQi1Q_i - Q_{i-1}, or equivalently, the dot product is zero,

0=(QestL(τ))T(QiQi1)=(Qest(1τ)Qi1τQi)T(QiQi1)=(QestQi1τ(QiQi1))T(QiQi1)=(QestQi)T(QiQi1)τQiQi12\begin{aligned} 0 &= (Q_{est} - L(\tau))^T(Q_i - Q_{i-1}) \\ &= (Q_{est} - (1-\tau)Q_{i-1} - \tau Q_i)^T(Q_i - Q_{i-1}) \\ &= (Q_{est} - Q_{i-1} - \tau(Q_{i} - Q_{i-1}))^T(Q_i - Q_{i-1}) \\ &= (Q_{est} - Q_i)^T(Q_i - Q_{i-1}) - \tau \| Q_{i} - Q_{i-1} \|^2 \end{aligned}

which means

τ=(QestQi)T(QiQi1)QiQi12.\tau^* = \frac{(Q_{est} - Q_i)^T(Q_i - Q_{i-1})}{ \| Q_{i} - Q_{i-1} \|^2}.

This works well if τ\tau is between 00 and 11 as in Case 1. If it turns out that τ\tau is outside the interval, check an adjacent interval. If τ>1\tau^* > 1 in one interval and τ<0\tau^* < 0 in an adjacent interval, then QestQ_{est} is in the undefined region of Case 2 and we can use QestQi\| Q_{est} - Q_i \| as the minimum distance.

Continuity, Differentiability and CnC^n

This section gives the definitions and a bit of explanation about the concepts of continuity, differentiability and CnC^n curves. A complete understanding of these ideas isn’t necessary for the mathematics above, but could be a useful reference.

Continuity

A continuous function is pretty much what you’d expect - there are no gaps in the plot of the function. For a function to be continuous at a point xx, it has to satisfy three conditions:

Diagram illustrating the limit definition of continuity, with a function value at x flanked by f(x + Δx) and f(x − Δx) converging as Δx shrinks.
Figure 12.

Mathematical definition of continuity.

In this picture, the Δx\Delta x‘s are too big, so f(x+Δx)f(x+\Delta x) isn’t the same as f(xΔx)f(x-\Delta x), but the way we handle this problem is to take the limit as Δx\Delta x gets close to zero. If the three conditions above hold for very tiny values of Δx\Delta x, then f(x)f(x) is continuous at the point xx, and if they hold for all values of xx, then we say the function is continuous.

Here are a couple of examples of functions that are not continuous.

Two examples of discontinuous functions: one with a visible jump and one with a removable hole at a single point.
Figure 13.

A discontinuous function.

The one on the left has a big jump, so it’s clearly not continuous. The one on the right looks continuous, but at xx, the yy value of f(x)f(x) isn’t the same as very nearby points, so it’s not continuous either. There’s a tiny hole in the function that you can’t see in the plot, but it makes the function discontinuous.

The absolute value function, y=xy = |x| is continuous even at x=0x=0 because as you approach x=0x = 0 from either side, x|x| gets close to zero and the absolute value of zero is zero.

Plot of the absolute value function y = |x|, which forms a V-shape with a corner at the origin where the derivative is undefined.
Figure 14.

The absolute value function is continuous, but not differentiable at 00.

Slope and differentiability

You might remember that the slope of a line is defined as the change in yy values divided by the change in xx,

m=ΔyΔx.m = \frac{\Delta y}{\Delta x}.

Looking at the left side of the absolute value function above, for x<0x < 0, every time the xx value changes by one unit, the yy value changes by 1-1, so the slope m=11=1m = \frac{-1}{1} = -1. On the other side, x>0x > 0, yy goes up by 11 for every change Δx=1\Delta x = 1, so the slope is m=1m= 1. What is the slope for a function that curves all over the place? Something like this function f(x)=x32x22x+3f(x) = x^3 - 2x^2-2x+3?

Plot of the cubic function f(x) = x³ − 2x² − 2x + 3, showing one local maximum and one local minimum.
Figure 15.

The cubic function f(x)=x32x22x+3f(x) = x^3 - 2x^2-2x+3.

Suppose we want to estimate the slope at x=1x = 1. We could choose a small step Δx=0.2\Delta x = 0.2 and get the change in yy-values between the two points. At x=1x=1, f(1)=1321221+3=0f(1) = 1^3 - 2 \cdot 1^2 - 2 \cdot 1 + 3 = 0. If we evaluate f(x)f(x) at x=1.2x = 1.2 we get f(1.2)=0.552f(1.2) = -0.552. So, the slope of the line between these two points is

m=ΔyΔx=0.5520.2=2.76.m = \frac{\Delta y}{\Delta x} = \frac{-0.552}{0.2} = -2.76.

If we took a smaller step, Δx=0.1\Delta x = 0.1, we find that f(1.1)=0.289f(1.1) = -0.289 which means that the slope is m=2.89m = -2.89. Which is the right answer?

Let’s use a little algebra to find out. First, we need to calculate f(x+Δx)=(x+Δx)32(x+Δx)22(x+Δx)+3f(x+\Delta x) = (x+\Delta x)^3-2(x+\Delta x)^2-2(x+\Delta x)+3. Instead of working out all of the terms, we can enter each one into Wolfram Alpha. You should get this,

Wolfram Alpha output showing the expansion of (x + Δx)³ into x³ + 3x²Δx + 3xΔx² + Δx³.
Figure 16.

Expansion of terms in Wolfram Alpha.

Next, expand the second term 2(x+Δx)2-2(x+\Delta x)^2 to get 2x24xΔx2Δx2-2x^2 - 4x \Delta x - 2 \Delta x^2. So, the expanded version of f(x+Δx)f(x+\Delta x) is

f(x+Δx)=x3+3x2Δx+3xΔx22x24xΔx2Δx22x2Δx+3.f(x+\Delta x) = x^3+3x^2 \Delta x + 3x \Delta x^2 -2x^2 - 4x \Delta x - 2 \Delta x^2 - 2x - 2 \Delta x + 3.

If we subtract f(x)f(x) from f(x+Δx)f(x + \Delta x), that’s just Δy\Delta y, the change in height as we go from xx to x+Δxx + \Delta x. Since f(x)=x32x22x+3f(x) = x^3 - 2x^2 - 2x +3, then

Δy=f(x+Δx)f(x)=3x2Δx+3xΔx24xΔx2Δx22Δx.\Delta y = f(x+\Delta x) - f(x) = 3x^2 \Delta x + 3x \Delta x^2 - 4x \Delta x - 2 \Delta x^2 - 2 \Delta x.

Now, we can calculate the slope,

m=ΔyΔx=f(x+Δx)f(x)Δx=3x2Δx+3xΔx24xΔx2Δx22ΔxΔx=3x2+3xΔx4x2Δx2.\begin{aligned} m &= \frac{\Delta y}{\Delta x} = \frac{f(x+\Delta x) - f(x)}{\Delta x} \\ &= \frac{3x^2 \Delta x + 3x \Delta x^2 - 4x \Delta x - 2 \Delta x^2 - 2 \Delta x}{\Delta x} \\ &= 3x^2 + 3x \Delta x - 4x - 2\Delta x - 2. \end{aligned}

We still don’t know how big the step Δx\Delta x should be, but it seems like it should be as small as we can possibly make it. If we imagine that Δx\Delta x is so small that it’s practically zero, then the two terms 3xΔx3x \Delta x and 2Δx-2 \Delta x become nearly zero to the point that they can be safely ignored. Then the slope is

m=3x24x2.m = 3x^2 - 4x - 2.

Notice that we didn’t select a value for xx before calculating the slope. What this means is that the equation for mm is valid for any xx, and instead of writing it as the slope m=m = something, it’s usually written as dfdx\frac{df}{dx} or more compactly f(x)f'(x) where the apostrophe indicates this new function, the derivative of f(x)f(x).

Side-by-side plots of f(x) = x³ − 2x² − 2x + 3 and its derivative f′(x) = 3x² − 4x − 2, with the zeros of the derivative corresponding to the extrema of f.
Figure 17.

The derivative function.

Now we can calculate the slope of f(x)f(x) at any point exactly. For x=1x=1, f(x)=342=3f'(x) = 3-4-2 = -3, meaning that right there, the change in yy divided by the change in xx is 3-3. To make these plots, go to the Desmos calculator and enter the equations in the panel on the left side.

Desmos calculator interface showing f(x) and f′(x) entered as equations in the left panel.
Figure 18.

Input equations in Desmos.

CnC^n curves

What was the point of all this discussion about continuity and differentiability? A CnC^n curve is one that is continuous, and the first, second, and so on, up to nnth derivatives are also continuous. Since f(x)f'(x) is a function, then we can take its derivative and get the second derivative function, f(x),f''(x), and keep going to the nnth derivative function f(n)(x)f^{(n)}(x). If all of these are continuous functions, then the original function is called CnC^n.

The absolute value function, f(x)=xf(x) = |x| is continuous, but the derivative is not continuous because it jumps from 1-1 to 11 at the origin (x=0x = 0). So, x|x| is a C0C^0 curve meaning that the function is continuous, but any derivative has a discontinuity at zero.

We showed that f(x)=x32x22x+3f(x) = x^3 -2 x^2 - 2x +3 has a derivative. You could check that both the function and its derivative are also continuous. Using the same method as before, it’s possible to show that the derivative of f(x)=3x24x2f'(x) = 3x^2 - 4x -2 is f(x)=6x4f''(x) = 6x-4, and the third derivative is f(x)=6f'''(x) = 6. All of these functions are also continuous everywhere, so f(x)f(x) is at least a C3C^3 function. If you take it one more step, you’ll find that f(4)(x)=0f^{(4)}(x) = 0 as is f(n)(x)f^{(n)}(x) for any n4n \geq 4. These are pretty boring functions because y=f(x)=0y = f(x) = 0 for every xx, but they are continuous and differentiable. A function that can be differentiated repeatedly like this where all of the derivatives are continuous is called a CC^\infty function (“Cee infinity”).

In the post Curve Fitting with Julia we discussed the Weierstrass Approximation Theorem which says that any continuous function can be approximated with a polynomial. The absolute value function is continuous, so the theorem says that we can find a polynomial approximate. While it would be possible to approximate C0C^0 functions with a Bézier polynomial, going one level up to C1C^1 functions makes the approximation much easier.

Why is the fitted result C1C^1? Each Bernstein basis polynomial Bm,j(t)=(mj)tj(1t)mjB_{m,j}(t) = \binom{m}{j}t^j(1-t)^{m-j} is a polynomial in tt, and a polynomial is infinitely differentiable everywhere. The Bézier curve B(t)=jPjBm,j(t)B(t) = \sum_j P_j B_{m,j}(t) is a linear combination of these polynomials, so it is also CC^\infty — in particular C1C^1 — with respect to tt. When viewed as a planar curve, both coordinates x(t)x(t) and y(t)y(t) are smooth, and the tangent direction (x(t),y(t))\bigl(x'(t), y'(t)\bigr) varies continuously everywhere except potentially at t=0t=0 or t=1t=1 if the endpoint control points are coincident. For the curves we fit here, the endpoints are distinct, so the fitted Bézier is fully C1C^1 on [0,1][0,1].

Cubic and Quadratic Spline Interpolation

This section, like the last, is background: a more accurate way to find the closest point on the curve than straight-line segments. It’s the high-precision version of the distance ODR minimizes for us, kept here for readers who want to compute closest points themselves.

One way to approximate a curve is to use polynomials to fit each segment of the curve. In fact, many CAD (Computer aided design) programs use cubic splines to generate curves. A cubic spline requires four coefficients, and if you want a curve that loops around a bit, you might have to parameterize the curve as a function of tt which would take a cubic in both the xx and yy coordinates for a total of eight coefficients.

Many CAD programs allow you to draw smooth curves by generating cubic splines, but the process can slow down the computer because of the number of calculations required. Even so, we can use cubic interpolations to find the nearest point to the curve QQ. By generating the interpolation once, the computational cost isn’t high, and then we can use some math to find the nearest point.

In Julia it’s easy to get a cubic interpolation of the points in the curve using the CubicSplineInterpolation function. Between each pair of curve points QiQ_i and Qi+1Q_{i+1}, CubicSplineInterpolation will return a pair of parameterized cubic functions,

xi(t)=αi3t3+αi2t2+αi1t+αi0yi(t)=βi3t3+βi2t2+βi1t+βi0\begin{aligned} x_i(t) &= \alpha_{i_3} t^3 +\alpha_{i_2} t^2 + \alpha_{i_1}t + \alpha_{i_0} \\ y_i(t) &= \beta_{i_3} t^3 +\beta_{i_2} t^2 + \beta_{i_1}t + \beta_{i_0} \end{aligned}

where t[0,1]t \in [0,1] for each segment. Suppose we have found a set of estimated curve points QestQ_{est} as above, and want to know the closest point QQ^* to the original curve. The green curve is the cubic approximation, the black line is tangent at the closest point, and the blue arrow is the vector from QQ^* to QestQ_{est}.

Diagram showing a cubic spline segment (green curve) with a tangent line (black) at the closest point Q*, and a blue arrow from Q* perpendicular to an estimated curve point Q_est.
Figure 19.

Quadratic interpolation of the curve points.

The tangent is just the derivative of xi(t),yi(t)x_i(t),y_i(t),

xi(t)=3αi3t2+2αi2t+αi1yi(t)=3βi3t2+2βi2t+βi1\begin{aligned} x_i'(t) &= 3\alpha_{i_3} t^2 + 2\alpha_{i_2} t + \alpha_{i_1} \\ y_i'(t) &= 3\beta_{i_3} t^2 + 2\beta_{i_2} t + \beta_{i_1} \end{aligned}

To simplify the notation a bit, let’s call the coordinates of Qest(ti)=(qx,qy)Q_{est}(t_i) = (q_x,q_y). The closest point, QQ^* has coordinates xi(t),yi(t)x_i(t^*),y_i(t^*) for some t[0,1]t^* \in [0,1]. To find the optimal value of tt^*, notice that the tangent line and the vector from QQ^* and Qest(ti)Q_{est}(t_i) are perpendicular, so their dot product is zero,

[xi(t)qxyi(t)qy][xi(t)yi(t)]=0\begin{bmatrix} x_i(t^*) - q_x & y_i(t^*) - q_y \end{bmatrix} \begin{bmatrix} x_i'(t^*) \\ y_i'(t^*) \end{bmatrix} = 0

or,

[αi3t3+αi2t2+αi1t+(αi0qx)βi3t3+βi2t2+βi1t+(βi0qy)][3αi3t2+2αi2t+αi13βi3t2+2βi2t+βi1]=0\begin{bmatrix} \alpha_{i_3} t^3 + \alpha_{i_2} t^2 + \alpha_{i_1}t + (\alpha_{i_0} - q_x) & \beta_{i_3} t^3 + \beta_{i_2} t^2 + \beta_{i_1}t + (\beta_{i_0} - q_y) \end{bmatrix} \begin{bmatrix} 3\alpha_{i_3} t^2 + 2\alpha_{i_2} t + \alpha_{i_1} \\ 3\beta_{i_3} t^2 + 2\beta_{i_2} t + \beta_{i_1}\end{bmatrix} = 0

when t=tt = t^*. Expanding and collecting terms gives a quintic equation,

0=3(αi32+βi32)t5+5(αi2αi3+βi2βi3)t4+(4(αi1αi3+βi1βi3)+2(αi22+βi22))t3+3(αi1αi2+βi1βi2αi3qxβi3qy)t2+(2(αi0αi2+βi0βi2αi2qxβi2qy)+αi12+βi12)t+(αi0αi1+βi0βi1αi1qxβi1qy).\begin{aligned} 0 &= 3(\alpha_{i_3}^2+\beta_{i_3}^2)t^5 \\ & +5(\alpha_{i_2}\alpha_{i_3}+\beta_{i_2}\beta_{i_3})t^4 \\ & + (4(\alpha_{i_1}\alpha_{i_3}+\beta_{i_1}\beta_{i_3})+2(\alpha_{i_2}^2 + \beta_{i_2}^2))t^3 \\ &+3(\alpha_{i_1}\alpha_{i_2}+\beta_{i_1}\beta_{i_2} - \alpha_{i_3} q_x - \beta_{i_3} q_y)t^2 \\ &+ (2(\alpha_{i_0}\alpha_{i_2} + \beta_{i_0}\beta_{i_2} - \alpha_{i_2}q_x - \beta_{i_2}q_y) + \alpha_{i_1}^2 + \beta_{i_1}^2)t \\ &+(\alpha_{i_0}\alpha_{i_1}+\beta_{i_0}\beta_{i_1} - \alpha_{i_1} q_x - \beta_{i_1} q_y). \end{aligned}

Finding the roots of 5th degree polynomials (or higher) is impossible analytically, and has to be done with numerical methods such as with the roots function. In this case, you would need to find the interval closest to Qest(ti)Q_{est}(t_i) and plug in the coefficients for the quintic to solve for the closest point. This needs to be done for each of the nn points in QestQ_{est}.

The first step is to fit cubic spline polynomials through the points in QQ. All we need are the coefficients α\alpha and β\beta, but what’s the right Julia code? One quick way to get the answer is to ask Google in AI mode. Here’s what I asked it:

Write a Julia function to fit a cubic spline interpolation through 2D points in array Q where Q is parameterized by the vector t, t[0] = 0, t_i < t_{i+1}, t[n] = 1. Return an nx4 array of coefficients for the cubic interpolating polynomials. It gave the following explanation and code,

To fit a parameterized cubic spline through a sequence of 2D points, we fit independent natural cubic splines to the XX and YY coordinates as functions of the parameter tt.

For each of the nn intervals between points, the interpolation polynomial takes the form: Si(t)=ai+bi(tti)+ci(tti)2+di(tti)3S_i(t) = a_i + b_i(t - t_i) + c_i(t - t_i)^2 + d_i(t - t_i)^3

Here is the complete, high-performance Julia implementation using standard linear algebra without external dependencies:

using LinearAlgebra



"""
    fit_cubic_spline_2d(t::Vector{Float64}, Q::Matrix{Float64})



Fits a natural cubic spline through 2D points in `Q` (size N×2) parameterized by `t` (length N).
Returns a tuple (C_x, C_y) where each is an n×4 array containing the coefficients 
[a, b, c, d] for the n = N-1 interval polynomials.
"""
function fit_cubic_spline_2d(t::Vector{Float64}, Q::Matrix{Float64})
    N = length(t)
    n = N - 1  # Number of intervals

    

    # Extract X and Y coordinates
    X = Q[:, 1]
    Y = Q[:, 2]

    

    # Calculate step sizes
    h = diff(t)

    

    # Solve for second derivatives (c coefficients) for X and Y
    c_x = solve_spline_system(h, X, N, n)
    c_y = solve_spline_system(h, Y, N, n)

    

    # Pre-allocate coefficient matrices (nx4) for [a, b, c, d]
    C_x = Matrix{Float64}(undef, n, 4)
    C_y = Matrix{Float64}(undef, n, 4)

    

    for i in 1:n
        # X-coordinate coefficients
        C_x[i, 1] = X[i]                                                 # a_i
        C_x[i, 3] = c_x[i]                                               # c_i
        C_x[i, 2] = (X[i+1] - X[i])/h[i] - h[i]*(2*c_x[i] + c_x[i+1])/3  # b_i
        C_x[i, 4] = (c_x[i+1] - c_x[i]) / (3 * h[i])                     # d_i

        

        # Y-coordinate coefficients
        C_y[i, 1] = Y[i]                                                 # a_i
        C_y[i, 3] = c_y[i]                                               # c_i
        C_y[i, 2] = (Y[i+1] - Y[i])/h[i] - h[i]*(2*c_y[i] + c_y[i+1])/3  # b_i
        C_y[i, 4] = (c_y[i+1] - c_y[i]) / (3 * h[i])                     # d_i
    end

    

    return C_x, C_y
end



# Helper function to solve the tridiagonal matrix system for natural cubic splines
function solve_spline_system(h::Vector{Float64}, coord::Vector{Float64}, N::Int, n::Int)
    # Natural boundary conditions specify c[1] = 0 and c[end] = 0
    # Constructing Tridiagonal system: A * c = B
    dl = zeros(N-1) # Lower diagonal
    d  = ones(N)    # Main diagonal
    du = zeros(N-1) # Upper diagonal
    B  = zeros(N)   # Right-hand side vector

    

    for i in 2:n
        dl[i-1] = h[i-1]
        d[i]    = 2 * (h[i-1] + h[i])
        du[i]   = h[i]
        B[i]    = 3 * ((coord[i+1] - coord[i])/h[i] - (coord[i] - coord[i-1])/h[i-1])
    end

    

    # Efficiently solve tridiagonal system using Julia's built-in Tridiagonal operator
    return Tridiagonal(dl, d, du) \ B
end

How to use this function

# 1. Define parameterized time vector from t=0 to t=1
t = [0.0, 0.25, 0.6, 1.0]



# 2. Define N x 2 matrix of target points
Q = [
    0.0  0.0;  # Point 1
    1.2  0.5;  # Point 2
    0.8  2.1;  # Point 3
    2.5  1.8   # Point 4
]



# 3. Fit the splines
coeffs_x, coeffs_y = fit_cubic_spline_2d(t, Q)



# coeffs_x is now a 3x4 Matrix containing:
# [a_1 b_1 c_1 d_1; 
#  a_2 b_2 c_2 d_2; 
#  a_3 b_3 c_3 d_3]
println("X Coefficients for Interval 1: ", coeffs_x[1, :])

Technical highlights


Since the fit only needs to be done once, it can be included with the function that estimates tt, and simplified to coeffs = CubicSplineInterpolation(t, Q).c. This generates a 4×n×24 \times n \times 2 array of coefficients, one set for each of the nn rows of QQ, and one each for the α\alpha and β\beta terms. The CubicSplineInterpolation function returns a 4×n4 \times n array rather than n×4n \times 4 but we need to combine terms in the cubic equation above anyway. Make an n×6n \times 6 array where each row contains the coefficients for the quintic polynomial above, and modify them with the terms containing qxq_x and qyq_y.

Instead of using cubic spline interpolations, you could fit each interval with a quadratic, or 2nd degree polynomial. If you do the same calculations as above but with quadratic fits, the resulting function will be a cubic polynomial.

In The Sum of the Sum of Some Numbers we discussed Tartaglia’s Depression which is a way to solve cubic equations by eliminating the t2t^2 term. The coefficients for the cubic are the terms in front of each tt which we can call a,b,c,da,b,c,d, so for instance, a=2(αi22+βi22)a = 2(\alpha_{i_2}^2+\beta_{i_2}^2).

Tartaglia wrote the solution to the cubic in the form of a poem. You can read about The Sordid Past of the Cubic Formula, and there’s a paper by Hongling Wang, Joseph Kearney, and Kendall Atkinson called Robust and Efficient Computation of the Closest Point on a Spline Curve that suggests using both a closed form solution like Tartaglia’s and an iterative method such as Newton’s.

The problem with using quadratic interpolation is that the splines aren’t smooth at the endpoints. It might not be noticeable if you plot the splines, but there will be sharp corners at the nodes. They won’t be as bad as using linear interpolation, but they’ll still be there. Cubic splines are designed to eliminate the corners at spline ends.

The goal is to find the value of tt for each QestQ_{est} that minimizes the distance to the original curve. Since we don’t have precise values for QQ between the measured points, attempting to get very accurate error estimates could be a lot of unnecessary work. Besides, the measured points in QQ are only as good as WebPlotDigitizer is able to read them from the input image. Depending on your application, you may want to consider using higher order fits.

Code for this article

This project provides an advanced toolkit for fitting Bézier curves to 2D datasets using Orthogonal Distance Regression (ODR) and robust automatic degree selection. It leverages ODRPACK to account for noise in both dimensions and provides sophisticated heuristics for selecting the optimal polynomial degree of the Bézier curve. All of the code is available on the Github repository Bezier-Polynomials.

Core ODR Engine & Math Foundation

These form the robust analytical and regression basis of the project:

Older workflows, utilities

These files contain the older workflows, utility readers, and the refactored wrappers that route into the new ODR setup:

Optimization

Application Layer

Robust Degree Selection

Dependency Installation

Visualization & Output Generators

Conclusion

What started as a shark fin drawn in GeoGebra turned into a complete pipeline for fitting smooth curves to real data. The key insight is that treating the parameter tt as a fixed quantity doesn’t work. Ordinary least squares with a guessed tt gives wrong control points, and the only cure is to let tt and the control points move together. The Orthogonal Distance Regression (ODR) algorithm handles it efficiently with analytic Jacobians and a trust-region solver.

The degree selection story has a similar lesson. Information criteria like AIC work well on noisy data because the penalty term can outrun a gradually declining SSE. On clean digitized data, SSE hits numerical zero too early, and all three criteria (AIC, AICc, BIC) chase marginal floating-point gains indefinitely. The geometric elbow in the max-residual profile is what actually finds the minimum, and select_degree_robust automates its detection with AIC as a fallback for the noisy case.

The tools built here — chord-length parameterization, endpoint-constrained warm start, ODR with analytic Jacobians, elbow-based degree selection, and monotone reparameterization — create a workflow pipeline that is useful well beyond shark fins. Some natural next steps: extending the approach to 3D space curves (the Bernstein basis is coordinate-independent), fitting closed loops (constrain P0=PmP_0 = P_m and t1=tnt_1 = t_n), or computing curvature profiles along the fitted curve to detect inflection points.

Software

References

Image credits

Sharknado: