I have a ton of repos in my ~/.mrconfig
, so a full mr update
takes a long time, even if parallelized. A lot of those remotes are on the same server, however, so I can't help but wonder if it would save time to enable ControlMaster
from mr directly. I know I could set that up in my ~/.ssh/config
myself, but I'm not sure I feel comfortable setting that all for all use cases. If it would be done in mr, it could setup the multiplexer and tear it down when done...
Here a full pull (mr -j10 update
) takes:
30.39user 8.05system 1:51.04elapsed 34%CPU (0avgtext+0avgdata 73320maxresident)k
0inputs+15664outputs (0major+529937minor)pagefaults 0swaps
One trick is that multiplexing might fail with -j
because of a race condition that ssh doesn't seem to handle very well:
ControlSocket /home/anarcat/.ssh/control-master-747c83a00480de0860fde4d7df86be06fe25e555.sock already exists, disabling multiplexing
That's with the following configuration:
ControlMaster auto
ControlPath ~/.ssh/control-master-%C.sock
So with that configuration, there's very little improvement, unfortunately. I would bet that, however, mr could be smarter than this and dispatch stuff much better...
Thanks! -- ?anarcat
I've worked to getting most of my remote repos over https instead of SSH. This is mainly because I have a prompt for each use of my SSH keys, but also helps with this problem.
I'm using ControlMaster for all SSH connections, with some situations where I turn it off, and I find it works fairly well apart from the annoying race condition you mention during
mr fetch
. I use time-limited connections so I don't have SSH running forever.Instead of configuring ControlMaster for all SSH connections, you could also just configure it for the hosts that you can only use git with (I'm thinking github.com/etc).
I suggest putting SSH sockets in $XDG_RUNTIME_DIR instead of HOME, so you never get stale sockets after reboot and never get backup programs looking at sockets (some do).
To be honest I'm not sure how myrepos could selectively enable ControlMaster for commands it runs. The only thing I can think of would be to modify PATH to add a script that runs the real
ssh
with additional options, but that seems pretty hacky.Either way, if ControlMaster provides very little improvement for you, adding it via myrepos instead of via the SSH config isn't going to change that.
PS: I only ever do
mr -j10 fetch
and I leave updating branches to interactive shells where I can deal with any conflicts manually.PS: I forgot to mention that I use https for the repo remote URLs and then I set the push URLs globally in
~/.gitconfig
like this:My wrapper script takes care of that: it makes sure SSH works first...
But it's true I might have better performance over HTTPS, I guess. It's just that I like the idea of pulling everything over SSH: I have a solid TOFU there that I don't have over HTTPS... And since this is an automated process, I really prefer to have that solid trust path in place. Plus setting up each push/pull URL is annoying, although there are nice ways around that, as you said...
Excellent suggestions, thanks!
Well, what I was thinking of is that etckeeper is the one that calls the git processes ultimately, so it could do some magic with
GIT_SSH_COMMAND
for example... It's VCS-specific of course...I guess I would first need to figure out that part of course.
That's also a good point, I always forget about
fetch
, thanks!One of the problems i have with
fetch
though is that it pulls all remotes that are configured. In some cases, those refer to USB keys or non-existent, transient repos... For example, I have this in one repo:It fails with the following:
This also makes
fetch
slower, as it fetches more remotes.Well, what I would like would be to the process to be much faster. I am not sure it's possible at all! I figured that ControlMaster might help, but I stumbled upon the race condition, so maybe that's the first thing to solve. Maybe, even, that's a bug in the ssh client itself that should never run into such a race condition (it could easily wait just a bit for another process to setup the socket correctly when it's created, for example).
Here's a new benchmark, with your
ControlPersist
setting. I still get those errors, unfortunately:... which feel more and more like a bug in SSH: if it exists, use it! If you can't use it, just wait! If you can't wait, just shut the hell up, no?
Anyways, without the multiplexer:
And with the multiplexer:
So it still shaves off a good 20 seconds, which is not negligible. But since I now use
fetch
instead of update, I'm actually back to square one, and get tons of warnings to go through. :pCan't help but think this is just a bug with ssh though...
About the TOFU issue, agreed that is better for security. I switched away because my
ControlMaster autoask
at the time was very annoying duringmr fetch
. On alioth I had an SSH key for read-only git access with no prompt and a key for read-write git access with a prompt, but that doesn't appear to be supported by gitlab/etc. I probably should rethink how to do this more optimally at some point.For the git fetch issue, you could do something like what I do here to make
mr -m fetch
not print things for remotes that don't have any changes and skip remotes that I disabled fetching from because they are gone now. Then just add any checks you want; if the remote is a USB stick and the path doesn't exist, skip the remote. You could also just make a git_fetchorigin command to only fetch one remote.git-annex contains a robust implementation of this in Annex/Ssh.hs. It's nontrivial; my implementation is over 400 LoC.
Some other complications that may or may not have been discussed:
thanks for all the advice, pabs and joeyh!
joeyh: I suspected there was a lot more to it, and considering what I know of Haskell, I suspect 400 LOC in Haskell would translate into something much more complex in myrepos. :p
pabs: neat trick! couldn't this be merged straight into mr though? it seems you have reimplemented most of the mr functions in your .mrconfig now.
also, now that I switched to
fetch
, i keep up stumbling upon the issue that i have to rerun update everywhere to get the changes... couldn't there be a way to run update (with--ff-only
maybe) and fetch at once? in other words, why doesn'tupdate
fetch changes from all remotes?My git_fetch is just a workaround for something that IMO belongs in git itself that I haven't gotten around to implementing in a way that would be acceptable to git upstream. I have a branch that implements what I wanted, but it got rejected upstream and I never got around to figuring out a way forward:
https://github.com/pabs3/git/commits/minimal-output https://public-inbox.org/git/1445741384-30828-1-git-send-email-pabs3@bonedaddy.net/
Personally, I think in many instances
git pull
should be deprecated in favour ofgit fetch
plus reviewing the incoming changes from upstream and then if appropriate fast-forwarding/rebasing/merging branches and fixing any issues. The obvious exception is upstreams that move so fast that review is not possible (like Linux) or people who have way too many repos checked out like meThe default
mr update
just runsgit pull
, which allegedly "is shorthand for git fetch followed by git merge FETCH_HEAD" according to the docs. Unfortunatelygit pull
doesn't seem to respect thealias.fetch
config and I don't see any other config option to make git update/fetch pull from all remotes. We could changemr update
to rungit pull --all
instead I guess, not sure if there are any downsides to that apart from the behaviour change...