I'm gonna handwave here, since only a few people have bothered taking data:
When a relation set is right at the cusp of building a matrix, a few more hours sieving will save more than a few hours to solve the matrix on that same machine (meaning CPU in both cases). At the relation counts most esmall and 15e jobs are processed at, 20 more corehours of sieving might save 5 or 10 corehours of matrix work (again, both measured on a CPU). I've done a few experiments at home, and I have yet to find a job where the sieving required to build a matrix at TD=120 saved more CPU time than it cost. I believe this could/would be the case on really big jobs, say with matrices at 50M+ in size. We have historically sieved more than needed because BOINC computation is cheap, while matrix solving time was in short supply. So, now that GPU matrix solving makes matrices not in short supply, we should sieve less. Something like 510% fewer relations, which means 510% more jobs done per calendar month. 
For 2,2174L, 1355M relations yielded 734M uniques. With nearly 50% duplicates, we have clearly reached the limit for 16e. Anyway, filtering yielded
Code:
matrix is 102063424 x 102063602 (51045.3 MB) with weight 14484270868 (141.91/col) Code:
linear algebra completed 2200905 of 102060161 dimensions (2.2%, ETA 129h 5m) 
Code:
sudo subscriptionmanager repos enable=rhel8forx86_64appstreamrpms sudo subscriptionmanager repos enable=rhel8forx86_64baseosrpms sudo subscriptionmanager repos enable=codereadybuilderforrhel8x86_64rpms sudo dnf configmanager addrepo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cudarhel8.repo sudo dnf module install nvidiadriver:latest sudo reboot sudo dnf install cuda114 echo 'export PATH=/usr/local/cuda11.4/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda11.4/lib64/:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc 

Or is this just the limit for 16e with 33bit large primes? I know you've avoided going higher because of the difficulty of the LA and the msieve filtering bug, but now that the bug is fixed and GPUs make the LA much easier, might it be worth going up to 34bit?

I looked through the code a few years ago and found no issues. Lasieve4 is also fine although it is limited to 96 bit mfba/r.

I give a try to receive NFS@Home WU and found lpbr and lpba 34 assignment of 2,2174M.
Here is the polynomial file S2M2174b.poly's contents. Code:
n: 470349924831928271476705309712184283829671891500377511256458133476241008159328553358384317181001385841345904968378352588310952651779460262173005355061503024245423661736289481941107679294474063050602745740433565487767078338816787736757703231764661986524341166060777900926495463269979500293362217153953866146837 skew: 1.22341 c6: 2 c5: 0 c4: 0 c3: 2 c2: 0 c1: 0 c0: 1 Y1: 1 Y0: 3064991081731777716716694054300618367237478244367204352 type: snfs rlim: 250000000 alim: 250000000 lpbr: 34 lpba: 34 mfbr: 99 mfba: 69 rlambda: 3.6 alambda: 2.6 Last fiddled with by wreck on 20210923 at 11:49 Reason: fix file name 
@frmky, for future reference, when I tested this I found that rational side sieving with *algebraic* 3LP was fastest. This shouldn't be too much of a surprise: the rational norms are larger, but not so much larger that 6 large primes across the two sides should split 4/2 rather than 3/3 (don't forget the specialq is a "free" large prime).

