todo/ControlMastermyreposhttp://myrepos.branchable.com/todo/ControlMaster/myreposikiwiki2019-11-10T12:13:44Zmy approachhttp://myrepos.branchable.com/todo/ControlMaster/comment_1_c4ff57e61405d4724dd6d8cf03ecea2b/PaulWise2019-10-18T04:27:20Z2019-10-18T04:27:19Z
<p>I've worked to getting most of my remote repos over https instead of SSH. This is mainly because I have a prompt for each use of my SSH keys, but also helps with this problem.</p>
<p>I'm using ControlMaster for all SSH connections, with some situations where I turn it off, and I find it works fairly well apart from the annoying race condition you mention during <code>mr fetch</code>. I use time-limited connections so I don't have SSH running forever.</p>
<p>Instead of configuring ControlMaster for all SSH connections, you could also just configure it for the hosts that you can only use git with (I'm thinking github.com/etc).</p>
<p>I suggest putting SSH sockets in $XDG_RUNTIME_DIR instead of HOME, so you never get stale sockets after reboot and never get backup programs looking at sockets (some do).</p>
<pre><code>ControlMaster autoask
ControlPath /run/user/%i/ssh-control-%l->%r@%h:%p
ControlPersist 60
</code></pre>
<p>To be honest I'm not sure how myrepos could selectively enable ControlMaster for commands it runs. The only thing I can think of would be to modify PATH to add a script that runs the real <code>ssh</code> with additional options, but that seems pretty hacky.</p>
<p>Either way, if ControlMaster provides very little improvement for you, adding it via myrepos instead of via the SSH config isn't going to change that.</p>
<p>PS: I only ever do <code>mr -j10 fetch</code> and I leave updating branches to interactive shells where I can deal with any conflicts manually.</p>
comment 2http://myrepos.branchable.com/todo/ControlMaster/comment_2_05f4d8be1ab4cd6e95bd9806a2f22983/PaulWise2019-10-18T04:29:25Z2019-10-18T04:29:24Z
<p>PS: I forgot to mention that I use https for the repo remote URLs and then I set the push URLs globally in <code>~/.gitconfig</code> like this:</p>
<pre><code>[url "ssh://github.com/"]
pushInsteadOf = https://github.com/
</code></pre>
dispatch stuff much better...http://myrepos.branchable.com/todo/ControlMaster/comment_3_ed8b11ab449b98868db2097383f69154/PaulWise2019-10-18T04:38:44Z2019-10-18T04:38:44Z
Perhaps by "dispatch stuff much better" you mean just to avoid the race condition? Grouping accesses to the same host isn't likely to improve speed, especially if it is an overloaded host you probably want to spread out accesses instead. For spreading accesses out, it isn't possible for myrepos to know what host(s) the command it runs will connect to, since that info is in the git config for each repo and the myrepos copy of that config could be out of date and would be missing non-origin remotes.
comment 4http://myrepos.branchable.com/todo/ControlMaster/comment_4_9614cbe750f552434edc6a4b789786c0/anarcat2019-10-18T14:50:55Z2019-10-18T14:50:54Z
<blockquote><p>I've worked to getting most of my remote repos over https instead of SSH. This is mainly because I have a prompt for each use of my SSH keys, but also helps with this problem.</p></blockquote>
<p>My <a href="https://gitlab.com/anarcat/scripts/raw/master/sync-boxes">wrapper script</a> takes care of that: it makes sure SSH works first...</p>
<p>But it's true I might have better performance over HTTPS, I guess. It's just that I like the idea of pulling everything over SSH: I have a solid TOFU there that I don't have over HTTPS... And since this is an automated process, I really prefer to have that solid trust path in place. Plus setting up each push/pull URL is annoying, although there are nice ways around that, as you said...</p>
<blockquote><p>Instead of configuring ControlMaster for all SSH connections, you could also just configure it for the hosts that you can only use git with (I'm thinking github.com/etc).</p>
<p>I suggest putting SSH sockets in $XDG_RUNTIME_DIR instead of HOME, so you never get stale sockets after reboot and never get backup programs looking at sockets (some do).</p></blockquote>
<p>Excellent suggestions, thanks!</p>
<blockquote><p>To be honest I'm not sure how myrepos could selectively enable ControlMaster for commands it runs. The only thing I can think of would be to modify PATH to add a script that runs the real ssh with additional options, but that seems pretty hacky.</p></blockquote>
<p>Well, what I was thinking of is that etckeeper is the one that calls the git processes ultimately, so it could do some magic with <code>GIT_SSH_COMMAND</code> for example... It's VCS-specific of course... <img src="http://myrepos.branchable.com/smileys/ohwell.png" alt=":/" /></p>
<blockquote><p>Either way, if ControlMaster provides very little improvement for you, adding it via myrepos instead of via the SSH config isn't going to change that.</p></blockquote>
<p>I guess I would first need to figure out <em>that</em> part of course. <img src="http://myrepos.branchable.com/smileys/smile4.png" alt=";)" /></p>
<blockquote><p>PS: I only ever do mr -j10 fetch and I leave updating branches to interactive shells where I can deal with any conflicts manually.</p></blockquote>
<p>That's also a good point, I always forget about <code>fetch</code>, thanks!</p>
<p>One of the problems i have with <code>fetch</code> though is that it pulls <em>all</em> remotes that are configured. In some cases, those refer to USB keys or non-existent, transient repos... For example, I have this in one repo:</p>
<pre><code>anarcat@angela:calendes(master)$ git rv
origin anarc.at:/var/www/calendes (fetch)
origin anarc.at:/var/www/calendes (push)
sneakernet /media/anarcat/KINGSTON/calendes/ (fetch)
sneakernet /media/anarcat/KINGSTON/calendes/ (push)
</code></pre>
<p>It fails with the following:</p>
<pre><code>mr fetch: /home/anarcat/Pictures/calendes
Récupération de origin
Récupération de sneakernet
fatal: '/media/anarcat/KINGSTON/calendes/' does not appear to be a git repository
fatal: Impossible de lire le dépôt distant.
</code></pre>
<p>This also makes <code>fetch</code> slower, as it fetches more remotes.</p>
<blockquote><p>Perhaps by "dispatch stuff much better" you mean just to avoid the race condition?</p></blockquote>
<p>Well, what I would like would be to the process to be much faster. I am not sure it's possible at all! I figured that ControlMaster might help, but I stumbled upon the race condition, so maybe that's the first thing to solve. Maybe, even, that's a bug in the ssh client itself that should <em>never</em> run into such a race condition (it could easily <em>wait</em> just a bit for another process to setup the socket correctly when it's created, for example).</p>
<p>Here's a new benchmark, with your <code>ControlPersist</code> setting. I still
get those errors, unfortunately:</p>
<pre><code>ControlSocket /run/user/1000/ssh-control-local-user@remote.example.com:22 already exists, disabling multiplexing
</code></pre>
<p>... which feel more and more like a bug in SSH: if it exists, use it!
If you can't use it, just wait! If you can't wait, just shut the hell
up, no? <img src="http://myrepos.branchable.com/smileys/smile.png" alt=":)" /></p>
<p>Anyways, without the multiplexer:</p>
<pre><code>13.45user 6.51system 2:10.01elapsed 15%CPU (0avgtext+0avgdata 44900maxresident)k
0inputs+14400outputs (0major+514164minor)pagefaults 0swaps
</code></pre>
<p>And with the multiplexer:</p>
<pre><code>8.75user 4.86system 1:48.98elapsed 12%CPU (0avgtext+0avgdata 44972maxresident)k
1760inputs+14000outputs (39major+474784minor)pagefaults 0swaps
</code></pre>
<p>So it still shaves off a good 20 seconds, which is not negligible. But
since I now use <code>fetch</code> instead of update, I'm actually back to square
one, and get tons of warnings to go through. :p</p>
<p>Can't help but think this is just a bug with ssh though...</p>
comment 5http://myrepos.branchable.com/todo/ControlMaster/comment_5_d6a858f29224cb8685a1ac504e218571/PaulWise2019-10-18T15:10:37Z2019-10-18T15:10:35Z
<p>About the TOFU issue, agreed that is better for security. I switched away because my <code>ControlMaster autoask</code> at the time was very annoying during <code>mr fetch</code>. On alioth I had an SSH key for read-only git access with no prompt and a key for read-write git access with a prompt, but that doesn't appear to be supported by gitlab/etc. I probably should rethink how to do this more optimally at some point.</p>
<p>For the git fetch issue, you could do something like what I do here to make <code>mr -m fetch</code> not print things for remotes that don't have any changes and skip remotes that I disabled fetching from because they are gone now. Then just add any checks you want; if the remote is a USB stick and the path doesn't exist, skip the remote. You could also just make a git_fetchorigin command to only fetch one remote.</p>
<pre><code># work around git annoyance with mr -m fetch
git_fetch =
git remote |
while read -r remote ; do
if [ xtrue != "x$(git config --bool "remote.$remote.skipFetchAll")" ] ; then
git fetch "$remote"
git fetch --tags "$remote"
fi
done
</code></pre>
comment 6http://myrepos.branchable.com/todo/ControlMaster/comment_6_21704b4357765ef19805b43ad4bd59cd/joey2019-10-20T18:08:53Z2019-10-20T17:50:20Z
<p>git-annex contains a robust implementation of this in Annex/Ssh.hs.
It's nontrivial; my implementation is over 400 LoC.</p>
<p>Some other complications that may or may not have been discussed:</p>
<ul>
<li>The path to a unix socket is limited to around 100 bytes,
so using the full server name in the socket name will sometimes fail.</li>
<li>Multiple password prompting has to be avoided when there's concurrency.
This and ssh failing on a socket that is not yet connected can both be
dealt with by locking until a connection is set up, running a dummy
command like "true" has to be run over ssh to know when the connection is
up.</li>
</ul>
thanks!http://myrepos.branchable.com/todo/ControlMaster/comment_7_3242281b289ab31b55c1d44e919fd3e7/anarcat2019-10-25T15:25:28Z2019-10-25T15:25:27Z
<p>thanks for all the advice, pabs and joeyh! <img src="http://myrepos.branchable.com/smileys/smile.png" alt=":)" /></p>
<p>joeyh: I suspected there was a lot more to it, and considering what I know of Haskell, I suspect 400 LOC in Haskell would translate into something much more complex in myrepos. :p</p>
<p>pabs: neat trick! couldn't this be merged straight into mr though? it seems you have reimplemented most of the mr functions in your .mrconfig now. <img src="http://myrepos.branchable.com/smileys/smile.png" alt=":)" /></p>
<p>also, now that I switched to <code>fetch</code>, i keep up stumbling upon the issue that i have to rerun update everywhere to get the changes... couldn't there be a way to run update (with <code>--ff-only</code> maybe) <em>and</em> fetch at once? in other words, why doesn't <code>update</code> fetch changes from all remotes?</p>
comment 8http://myrepos.branchable.com/todo/ControlMaster/comment_8_211af62c869466d5fe456e07c24ab191/PaulWise2019-11-10T12:13:44Z2019-10-27T02:15:24Z
<p>My git_fetch is just a workaround for something that IMO belongs in git itself that I haven't gotten around to implementing in a way that would be acceptable to git upstream. I have a branch that implements what I wanted, but it got rejected upstream and I never got around to figuring out a way forward:</p>
<p>https://github.com/pabs3/git/commits/minimal-output
https://public-inbox.org/git/1445741384-30828-1-git-send-email-pabs3@bonedaddy.net/</p>
<p>Personally, I think in many instances <code>git pull</code> should be deprecated in favour of <code>git fetch</code> plus reviewing the incoming changes from upstream and then if appropriate fast-forwarding/rebasing/merging branches and fixing any issues. The obvious exception is upstreams that move so fast that review is not possible (like Linux) or people who have way too many repos checked out like me <img src="http://myrepos.branchable.com/smileys/smile.png" alt=":)" /></p>
<p>The default <code>mr update</code> just runs <code>git pull</code>, which allegedly "is shorthand for git fetch followed by git merge FETCH_HEAD" according to the docs. Unfortunately <code>git pull</code> doesn't seem to respect the <code>alias.fetch</code> config and I don't see any other config option to make git update/fetch pull from all remotes. We could change <code>mr update</code> to run <code>git pull --all</code> instead I guess, not sure if there are any downsides to that apart from the behaviour change...</p>