@pervognsen@mastodon.social avatar

pervognsen

@[email protected]

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. View on remote instance

dotstdy , to random
@dotstdy@mastodon.social avatar

I feel like the most difficult part of subgroups and GPU programming in general, is getting all the terminology straight in your head. Sometimes it seems like it would be easier just writing rdna asm directly. :')

pervognsen ,
@pervognsen@mastodon.social avatar

@dotstdy You know what has never helped? Every IHV and API having their own incompatible terminology!

pervognsen ,
@pervognsen@mastodon.social avatar

@dotstdy It's my favorite too. It used to be fashionable to deride it as marketing wank but I think it's evocative and memorable and the hierarchy of the terms makes sense once you know that warp isn't a sci-fi word.

pervognsen ,
@pervognsen@mastodon.social avatar

@dotstdy I think the marketing wank accusation is justified when we're counting each separate lane of a SIMD unit as a CUDA core. :)

pervognsen , to random
@pervognsen@mastodon.social avatar

I haven't done any real Vulkan programming since 1.0. Are there any good guides that skip all the legacy junk and only show the streamlined 1.3 way of doing things?

pervognsen OP ,
@pervognsen@mastodon.social avatar

@zeux What's the compatibility landscape like for GPUs that support 1.3 but don't support bindless? I was hoping to just require bindless.

pervognsen OP ,
@pervognsen@mastodon.social avatar

@zeux Oh, I just remembered you had your Niagara project. Do you recommend using that as a reference for good practices, etc?

dpiponi , to random
@dpiponi@mathstodon.xyz avatar

It's not like I had any chance of resisting when one of my favourite books is published in a fancy new hardback edition

pervognsen ,
@pervognsen@mastodon.social avatar

@dpiponi Where are you on Player of Games vs Use of Weapons?

shriramk , to random
@shriramk@mastodon.social avatar

You can't be the financial capital of the world if you can't monetize everything in the news. (Hidden Grounds cafe, NYC.)

pervognsen ,
@pervognsen@mastodon.social avatar

@shriramk I hope you put that $10 in the Kendrick jar.

pervognsen , to random
@pervognsen@mastodon.social avatar

Nothing is new: hash consing/value numbering in 1958. On Programming of Arithmetic Operations, A. P. Ershov, https://dl.acm.org/doi/10.1145/368892.368907

pervognsen OP , (edited )
@pervognsen@mastodon.social avatar

Ershov also independently invents open-addressed linear probing in that short paper although Amdahl, et al, had the idea a few years earlier in 1954.

pervognsen OP ,
@pervognsen@mastodon.social avatar

Let's also invent the Sethi-Ullman algorithm 12 years early while we're at it.

pervognsen OP ,
@pervognsen@mastodon.social avatar

(I'm not sure how much credit he gets for that. I've always been amused that Sethi-Ullman gets to have a fancy name attached for something so simple and relatively limited in practice. Whereas value numbering/hash consing might be a simple idea but it's extremely powerful and far reaching. But it's nice to see him attack related parts of the problem at once in such a short paper, not just value numbering but instruction scheduling and register allocation, since they all affect each other.)

pervognsen , to random
@pervognsen@mastodon.social avatar

One of my favorite hip-hop instrumentals: https://www.youtube.com/watch?v=s6Yyb3N9IuA. I was listening to J Cole's Everybody Dies and a YouTube commenter had just written "Kenny Dope" without any further context or explanation and I immediately understood what it meant.

pervognsen OP ,
@pervognsen@mastodon.social avatar

If you don't get the reference, listen to the two tracks back to back: https://www.youtube.com/watch?v=-5slZHLSnow. They both sample https://en.wikipedia.org/wiki/Inside_My_Love.

lritter , to random
@lritter@mastodon.gamedev.place avatar

interesting problem: progressively mapping a cosmically high number of unique strings of arbitrary length to an ordered set so that we can assign an index to each string, extract a substring from each index, and filter strings not in the set, ideally in logarithmic time.

evidently, this approach requires (and constitutes) data compression. the compressed result is functionally equivalent to a regular expression, or a schema validation system.

pervognsen ,
@pervognsen@mastodon.social avatar

@lritter You didn't define everything to the point where I'm completely sure what you're describing but maybe https://blog.burntsushi.net/transducers/ is relevant.

pervognsen ,
@pervognsen@mastodon.social avatar

@lritter Alright, I thought you were talking about strings-strings. Carry on. :)

pervognsen ,
@pervognsen@mastodon.social avatar
pervognsen ,
@pervognsen@mastodon.social avatar

@lritter Definitely one of the best simple ideas in CS.

pervognsen ,
@pervognsen@mastodon.social avatar

@lritter Yeah, that's why I said it's all hash consing. It's very general and goes at least as far as back as a Russian paper in the early 60s on value numbering.

pervognsen ,
@pervognsen@mastodon.social avatar

@lritter My bad, make that late 50s. https://dl.acm.org/doi/10.1145/368892.368907. Although I remember the terminology in that paper being somewhat impenetrable and the generality not so immediately apparent.

gigantos , to random
@gigantos@social.linux.pizza avatar

When people brag about how fast their project is, is that because they come from a non-system language?

I love the language, but there is no reason it would be faster than #C. Just less error prone. The best case for Rust is to be as fast as C, not faster.

pervognsen ,
@pervognsen@mastodon.social avatar

@gigantos @ekuber @diondokter Ignoring mmap for a second, the situation seems pretty comparable in Rust and C if you follow the C spec? You can work with lvalues/places of unaligned items without any effort in both languages, you just can't have references to them in Rust or pointers to them in C, e.g. https://c.godbolt.org/z/EPT74bjK9. Raw pointers in Rust are actually more lax than C pointers for unaligned data since alignment is only a safety precondition of specific functions not of the pointer type.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@gigantos @ekuber @diondokter And if you want to work with slices of unaligned items, you can just wrap all the leaf-level fields in

#[repr(packed, C)]
pub struct Packed<T>(T);

and then everything has alignment 1 and you're good. The bigger issue is mmap but if you're okay with stipulating that it's UB only if there are concurrent modifications from out of process then you should be good. And in C, if you have concurrent modifications it will also trigger UB if unsynchronized.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@gigantos @ekuber @diondokter In the C11 memory model, data races are UB and that's what an unsynchronized concurrent mmap modification would usually trigger there too. So you'd need to take the same or similar precautions in both languages if you wanted to avoid UB for that situation, i.e. use atomic loads/stores or other forms of synchronization. Or stipulate as a requirement to the user that there are no such concurrent modifications and let the user suffer the consequences if there are.

pervognsen ,
@pervognsen@mastodon.social avatar

@gigantos @ekuber @diondokter Right, I grew up in that world too (and C is still the only language I use regularly other than Rust). But if we're talking about the C spec and C compilers as they currently exist, they're treating these matters a lot more like rustc than the C compilers we grew up with in the 80s and 90s. So I do think it makes sense to take a more "by the book" approach if we're comparing the situations in 2024.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@gigantos @ekuber @diondokter One small addendum: you need to deal with endianness anyway for file formats, so you might as well handle all of this at once, e.g. #[repr(C)] struct PackedLeU64([u8; 8]) with a get method that calls u64::from_le and a set method that calls u64::to_le. But the same thing applies as before: as long as you "push down" all of these packed types to the leaf-level data everything else can use references, e.g. &[PackedLeU64], and the compiler will be happy.

vurtun , to random
@vurtun@mastodon.gamedev.place avatar

Currently in process of converting my tool code from using dynamic arrays for lists/tree views to using fixed size arrays to complete the cycle.

Common problem is basically SELECT * FROM xxx WHERE xxx LIMIT off, count SORT BY xxxx. So we only want count elements from offset off and sorted by certain element.

Looked into it and with help from @slembcke found Floyd–Rivest and heap to get n*log(k) runtime.

Written about here: https://gist.github.com/vurtun/7063afddcf1491af16037a207a167e49. Does anyone know something better than this?

pervognsen ,
@pervognsen@mastodon.social avatar

@vurtun Ah, I didn't make the connection with that optimization comment but I can see now that's what you had in mind.

pervognsen ,
@pervognsen@mastodon.social avatar

@vurtun Right, it's basically the same old IMGUI idea. It's just that sometimes people revert to indices that implicitly presume a fixed snapshot in time of something that in reality might be dynamically changing.

I think you can reconcile file-as-anchor with the traditional "jump to the 70th percentile of the list" UI because when you perform that kind of random access action you have no presumption that any particular entity is at that percentile so it's actually a different query altogether.

pervognsen ,
@pervognsen@mastodon.social avatar

@vurtun In other words, that kind of percentile selection problem is actually where you need a selection algorithm like Floyd-Rivest. Because the query is inherently about percentiles. When your scroll window is anchored/focused on an item you're now in a fundamentally different situation. Another way to see that the situation is different is that the percentile query really is inherently about a snapshot in time whereas scrolling/pagination from a given point is something else.

steve , to random
@steve@discuss.systems avatar

A slight re-organization of Priest's "Efficient Scaling for Complex Division" to make it compatible with "try to divide the dumb fast way inline, then branch to rescale only if necessary" while preserving scale invariance of rounding.

Also fixes it up to work for Float16, which the original approach does not.

Further optimization possible and pretty straightforward.

https://github.com/apple/swift-numerics/pull/289

pervognsen ,
@pervognsen@mastodon.social avatar

@saagar @steve @neilhenning That's affine algebra. Linear algebra is when you're stuck at y=mx.

graydon , to random
@graydon@types.pl avatar

I did a little blogging about Rust and formal methods and local reasoning and stuff. Let's see how fast a link posted to the fediverse propagates!

https://graydon2.dreamwidth.org/312681.html

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich @graydon Thanks, I looked a bit at the thesis. It seems like non-escaping uses of mutable references are relatively easy to represent functionally since you can represent them as returning new copies, i.e. f(&mut A) -> B is represented as f(A) -> (A, B). This seems to be identical to the subset of cases where the MVS people have a simple story. But the functional desugaring of returning mutable references in Ullrich's thesis on Electrolysis is a bit of a nightmare...

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@zwarich @graydon From what I can tell he doesn't treat the general case, only when the returned mutable reference borrows from the first parameter/self. He represents that with a functional lens, but it's simpler than the general case since you know it composes with the first argument. So in the general case I guess you'd need a sum of different possible lenses, one for each possible thing it could have come from in the parameter list? Definitely lacks the appeal of the other cases.

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich Thanks again for the pointers. So they essentially split a returned mutable reference into its read capability (the forward function) and its write capability (the backward function). The way that "cashing in" the write capability is tied to the termination of the region is nice, if I understand it correctly. At that point you "commit" all the values of the mutable borrows like a transaction. Which in the functional representation is just returning their final values.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@zwarich This also has a pretty comparable low-level operational interpretation in compilers where you can cache stuff in registers or other temp locations but you have to commit/write back the final values to their home locations when the mutable borrow region terminates. So it's the functional equivalent of that low-level picture. Does that match your intuition for what they're doing?

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich By the way, I was thinking about this in the back of my brain since yesterday but I realized I felt a little uncomfortable even with the fn(&mut A) -> B to fn(A) -> (A, B) desugaring. Or more precisely, I think that's fine from the caller's point of view but what about the callee? One of the tangible differences in Rust as a callee when you have a r: &mut A versus an a: A is that you can't temporarily move out of *r. Basically because of panic unwinding. So I feel like there's a gap.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@zwarich You can certainly desugar away all the complexity from the unwinding by just saying that it all amounts to returning the new value of *r regardless of return vs unwind, but it feels like there's something missing in the story somehow if you're trying to give a real account of what's going on. Because in safe code you really can't do with a &mut A parameter what you can do with an A param and A return because of some of these "temp move-out" issues.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@zwarich In real Rust your options seem to be replacing *r with a temporary valid value (not always possible depending on the type in question) or to use unsafe code with ptr::read/write and especially in the latter case to avoid UB no-one should observe *r before you put back a valid value (and people often use a drop guard to ensure that). From the outside, sure, this is still desugarable to "old A comes in, new A comes back out" but the inside picture is drastically different.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@zwarich This stuff with unwinding is probably at a level they're not trying to model but I feel like I constantly run into the practical challenges of "only" having a &mut A parameter in safe code and it needs to be accounted for somehow in the semantics.

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich Right, mem::replace with a valid placeholder value is the first option I mentioned (and is applicable in safe code) but you might not have a placeholder value if it's a "resource" like a file or mutex or you're dealing with a generic type. And I agree that this is an issue in Rust that I wish had a better story on Rust's side but on the other hand presumably you're trying to explain how Rust actually works to some level of fidelity.

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich I really feel like this practical gap with &mut A parameters acutely versus value passing/returning and it's been one of the major disappointments with Rust for me, honestly, especially given how bad the compiler (I guess mostly on the LLVM side) is about dealing with large values passing in and out of functions.

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich Disappointments? Sure, lots of things, it's just that I thought this at least should have been a solved problem!

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich Right, I'm hopeful we'll have a solution for that in the language eventually (Niko had that blog post on view types but I don't think it went anywhere so far) but that's definitely a big practical issue and I think often forces/pushes people into physically reorganizing their data structures just to make it easier to work with the borrow checker, e.g. SoA.

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich If you think about the Ted Codd sort of thing about separating the physical and logical access paths for data, the idea of physically organizing your data structures completely just to please the borrow checker is kind of obscene, and the situation right now is definitely not great.

pervognsen ,
@pervognsen@mastodon.social avatar

@zwarich By the way, I realized this situation is another example of how region termination for mutable borrows has a tangible semantic effect. We have to "put back" the temporarily moved out values when the region terminates; it's a UB-safety obligation in the above example with ptr::read/write and a functional correctness obligation with mem::replace placeholders, but it has the same exact character of committing final values to the home location.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@zwarich By the way even though I don't think there's a built-in for this I'm pretty sure you want to offer a utility function with this signature:

fn call_or_abort<T, R>(m: &mut T, f: impl FnOnce(T) -> (T, R)) -> R;

It has a drop guard to catch panic unwinds and turn them into aborts. It basically implements the same desugaring as the paper.

pervognsen , (edited )
@pervognsen@mastodon.social avatar

@zwarich Sorry, I realized I'm replying in the wrong thread but since we're just all over the place. Anyway, that function smells a bit like the 'run' function for a monad where the inner world has an enriched effect semantics. Or something like that--it definitely has that kind of "bridging two worlds" feel.

pervognsen ,
@pervognsen@mastodon.social avatar
amonakov , to random
@amonakov@mastodon.gamedev.place avatar

(prompted by discussion of detecting bitwise and-not earlier in GCC's optimization pipeline)

My ideal compiler IR would not have and/or/xor as distinct bitwise ops, just generic ternlog and probably the corresponding two-operand function ("bilog"?) too.

pervognsen ,
@pervognsen@mastodon.social avatar

@amonakov For a reason I don't fully understand, this seems to be common in GPUs but not in CPUs. Even the VPTERNLOG instructions in AVX-512 were inherited from Larrabee AFAIK. Maybe GPU ISAs are less averse to many-operand instructions than CPU ISAs have traditionally been?

pervognsen ,
@pervognsen@mastodon.social avatar

@amonakov Hmm, I just remembered Southern Islands had a metric truckload of 3 in, 1 out instructions and thought I remembered ternlog being in there. But looking through the ISA manual now I can't find it.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • tech
  • kbinEarth
  • testing
  • interstellar
  • wanderlust
  • All magazines